diff --git a/CHANGELOG.md b/CHANGELOG.md index 5706242..5a00219 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,15 +6,23 @@ The format is based on Keep a Changelog and this project follows Semantic Versio ## [Unreleased] +### Added + +- **Long-context Python function retrieval benchmark** — added seven built-in context-window templates linked to file-backed Python function-retrieval datasets, with front/middle/late function placement, two-function retrieval, and a negative control. +- **Long-context Python needle benchmark** — added seven built-in context-window templates linked to file-backed Python positional-recall datasets, with front/middle/late needle placement, 4k-256k context sizes, two-fact retrieval, and a negative control. + ### Changed - **Changelog category workflow** — `AGENTS.md` now requires changelog updates to preserve Keep a Changelog category headings and place entries under the appropriate `Added`, `Changed`, `Fixed`, `Removed`, or `Security` section instead of flattening release notes. +- **Run fatal upstream errors** — Run-created benchmark profiles now cancel on the first fatal upstream error, context-window retrieval stops on the first failed item, and HTTP diagnostics preserve upstream provider codes such as `prefill_memory_exceeded`. +- **Run template capability filtering** — Run now disables benchmark templates that exceed a selected model's declared context window or require tool calling when the selected model/server is not tool-capable. +- **Run audit and functional checks split** — Run now separates pipeline execution health from functional benchmark checks, and treats missing required terms as a visible functional failure when exact matching is disabled. +- **Run functional failure clue** — Run now surfaces a benchmark assertion failure line when quality metrics fail despite a technically completed run, with categories such as invalid tool arguments or missing tool calls. ## [0.10.0] - 2026-06-19 ### Added -- **Run functional failure clue** — Run now surfaces a benchmark assertion failure line when quality metrics fail despite a technically completed run, with categories such as invalid tool arguments or missing tool calls. - **Datasets editor checkpoint** — added a Datasets page and JSONL dataset-file API for creating, editing, saving, and deleting dataset item files under `INFERHARNESS_BENCHMARK_DATASET_ROOT`, with synced `dataset_manifest` documents, copy-down editing for repeated fields, and clamped long-prompt display. - **Tool-call assertion metric** — benchmark tool-call templates now include `tool_call_assertion_pass`, a single-turn pass/fail metric requiring exact expected tool selection and structurally matching arguments while keeping assertion failures as quality metrics rather than execution failures. - **Tool-call assertion UI** — Run now promotes tool-call assertion pass/fail as the primary correctness verdict, and Templates groups metrics with readable labels while auto-adding the assertion metric when tool calling is enabled. diff --git a/README.md b/README.md index 7b4586e..1a9d02b 100644 --- a/README.md +++ b/README.md @@ -81,7 +81,7 @@ This means a result is more than a screenshot or a manually copied answer. It is Register local or remote inference servers, discover available models, and maintain a model catalog with provider, format, quantization, capabilities, and base-model metadata. **Reusable test definitions** -Start with built-in benchmark templates, then create tests for one prompt, a dataset loop, tool-calling behavior, structured output, or multi-model comparisons. +Start with built-in benchmark templates, then create tests for one prompt, a dataset loop, tool-calling behavior, long-context needle or function retrieval, structured output, or multi-model comparisons. Benchmark documents are persisted as JSON in a file-backed library and indexed into SQLite for runtime use. Built-in documents ship with the app, while user-created templates, datasets, runtime profiles, and plans are written to a local library directory so they can be restored if the database is rebuilt. @@ -91,6 +91,8 @@ Use the Templates page agent as the primary authoring flow to challenge underspe **Benchmark runs** Run the same test against one model, many models, or the same model served by different inference servers. When a selected template has a unique linked `dataset_manifest`, Run uses that manifest automatically instead of creating a prompt or file-backed dataset manifest. +Run disables templates that exceed a selected model's declared context window or require unsupported tool calling. +Run separates execution health from functional checks so a technically completed pipeline can still show failed retrieval, schema, or tool-call assertions. **Automated metrics** Capture time to first token, total latency, prefill/decode timing, prompt tokens, completion tokens, and tokens per second. diff --git a/backend/data/datasets/context-function-retrieval-128k.jsonl b/backend/data/datasets/context-function-retrieval-128k.jsonl new file mode 100644 index 0000000..a98ac41 --- /dev/null +++ b/backend/data/datasets/context-function-retrieval-128k.jsonl @@ -0,0 +1,5 @@ +{"id":"function-front-128k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-front-128k\nApproximate target context: 128000 tokens.\nReturn the complete source code of the Python function or method `_constructor_from_mgr`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n \n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n```\n
","tags":["context-window","function-retrieval","python","front","128k"],"expected_answer":["def _constructor_from_mgr(self, mgr, axes) -> DataFrame:","df = DataFrame._from_mgr(mgr, axes=axes)","if type(self) is DataFrame:","return df"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":128000,"function_name":"_constructor_from_mgr","function_position":"front","evaluation_mode":"function_required_terms","expected_full_answer":" def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)"}} +{"id":"function-middle-128k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-middle-128k\nApproximate target context: 128000 tokens.\nReturn the complete source code of the Python function or method `_arith_method`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default \n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default\n```\n
","tags":["context-window","function-retrieval","python","middle","128k"],"expected_answer":["def _arith_method(self, other, op) -> DataFrame:","if self._should_reindex_frame_op(other, op, 1, None, None):","return self._arith_method_with_reindex(other, op)","axis: Literal[1] = 1 # only relevant for Series other case"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":128000,"function_name":"_arith_method","function_position":"middle","evaluation_mode":"function_required_terms","expected_full_answer":" def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)"}} +{"id":"function-late-128k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-late-128k\nApproximate target context: 128000 tokens.\nReturn the complete source code of the Python function or method `_reindex_for_setitem`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broa\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n ","tags":["context-window","function-retrieval","python","late","128k"],"expected_answer":["def _reindex_for_setitem(","if value.index.equals(index) or not len(index):","if isinstance(value, Series):","return value._values, value._references"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":128000,"function_name":"_reindex_for_setitem","function_position":"late","evaluation_mode":"function_required_terms","expected_full_answer":"def _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None"}} +{"id":"function-two-blocks-128k","system_prompt":"You are a strict code retrieval engine. Return only the requested code blocks or NOT_FOUND.","prompt":"Context-window function retrieval item: function-two-blocks-128k\nApproximate target context: 128000 tokens.\nReturn the complete source code for `_construct_result` first, then a blank line, then the complete source code for `_to_dict_of_blocks`. For each function, include only its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n \n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n \n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n```\n
","tags":["context-window","function-retrieval","python","two-functions","128k"],"expected_answer":["def _construct_result(self, result, other) -> DataFrame:","out = self._constructor(result, copy=False).__finalize__(self)","out.columns = self.columns","out.index = self.index","def _to_dict_of_blocks(self) -> dict[str, DataFrame]:","mgr = self._mgr","return {","k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":128000,"function_names":["_construct_result","_to_dict_of_blocks"],"function_position":"two_functions_20_and_80_percent","evaluation_mode":"two_function_required_terms","expected_full_answer":" def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }"}} +{"id":"function-negative-control-128k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-negative-control-128k\nApproximate target context: 128000 tokens.\nThe source may or may not contain a Python function named `_inferharness_missing_context_probe`. If the function is absent, reply exactly: NOT_FOUND.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n # test_append_empty_frame_to_series_with_dateutil_tz\n row_df = row_df.infer_objects().rename_axis(index.names)\n\n if len(row_df.columns) == len(self.columns):\n # Pre-cast the row's value to the original column dtype where the\n # row's inferred dtype\n```\n
","tags":["context-window","function-retrieval","python","negative-control","128k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":128000,"function_name":"_inferharness_missing_context_probe","function_position":"absent","evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-function-retrieval-16k.jsonl b/backend/data/datasets/context-function-retrieval-16k.jsonl new file mode 100644 index 0000000..99dc7ce --- /dev/null +++ b/backend/data/datasets/context-function-retrieval-16k.jsonl @@ -0,0 +1,5 @@ +{"id":"function-front-16k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-front-16k\nApproximate target context: 16000 tokens.\nReturn the complete source code of the Python function or method `_constructor_from_mgr`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n```\n
","tags":["context-window","function-retrieval","python","front","16k"],"expected_answer":["def _constructor_from_mgr(self, mgr, axes) -> DataFrame:","df = DataFrame._from_mgr(mgr, axes=axes)","if type(self) is DataFrame:","return df"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":16000,"function_name":"_constructor_from_mgr","function_position":"front","evaluation_mode":"function_required_terms","expected_full_answer":" def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)"}} +{"id":"function-middle-16k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-middle-16k\nApproximate target context: 16000 tokens.\nReturn the complete source code of the Python function or method `_arith_method`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = \n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None =\n```\n
","tags":["context-window","function-retrieval","python","middle","16k"],"expected_answer":["def _arith_method(self, other, op) -> DataFrame:","if self._should_reindex_frame_op(other, op, 1, None, None):","return self._arith_method_with_reindex(other, op)","axis: Literal[1] = 1 # only relevant for Series other case"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":16000,"function_name":"_arith_method","function_position":"middle","evaluation_mode":"function_required_terms","expected_full_answer":" def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)"}} +{"id":"function-late-16k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-late-16k\nApproximate target context: 16000 tokens.\nReturn the complete source code of the Python function or method `_reindex_for_setitem`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=Fals\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n```\n
","tags":["context-window","function-retrieval","python","late","16k"],"expected_answer":["def _reindex_for_setitem(","if value.index.equals(index) or not len(index):","if isinstance(value, Series):","return value._values, value._references"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":16000,"function_name":"_reindex_for_setitem","function_position":"late","evaluation_mode":"function_required_terms","expected_full_answer":"def _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None"}} +{"id":"function-two-blocks-16k","system_prompt":"You are a strict code retrieval engine. Return only the requested code blocks or NOT_FOUND.","prompt":"Context-window function retrieval item: function-two-blocks-16k\nApproximate target context: 16000 tokens.\nReturn the complete source code for `_construct_result` first, then a blank line, then the complete source code for `_to_dict_of_blocks`. For each function, include only its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated \n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n \n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated a\n```\n
","tags":["context-window","function-retrieval","python","two-functions","16k"],"expected_answer":["def _construct_result(self, result, other) -> DataFrame:","out = self._constructor(result, copy=False).__finalize__(self)","out.columns = self.columns","out.index = self.index","def _to_dict_of_blocks(self) -> dict[str, DataFrame]:","mgr = self._mgr","return {","k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":16000,"function_names":["_construct_result","_to_dict_of_blocks"],"function_position":"two_functions_20_and_80_percent","evaluation_mode":"two_function_required_terms","expected_full_answer":" def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }"}} +{"id":"function-negative-control-16k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-negative-control-16k\nApproximate target context: 16000 tokens.\nThe source may or may not contain a Python function named `_inferharness_missing_context_probe`. If the function is absent, reply exactly: NOT_FOUND.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict lik\n```\n
","tags":["context-window","function-retrieval","python","negative-control","16k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":16000,"function_name":"_inferharness_missing_context_probe","function_position":"absent","evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-function-retrieval-256k.jsonl b/backend/data/datasets/context-function-retrieval-256k.jsonl new file mode 100644 index 0000000..53858d9 --- /dev/null +++ b/backend/data/datasets/context-function-retrieval-256k.jsonl @@ -0,0 +1,5 @@ +{"id":"function-front-256k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-front-256k\nApproximate target context: 256000 tokens.\nReturn the complete source code of the Python function or method `_constructor_from_mgr`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike \n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n # test_append_empty_frame_to_series_with_dateutil_tz\n row_df = row_df.infer_objects().rename_axis(index.names)\n\n if len(row_df.columns) == len(self.columns):\n # Pre-cast the row's value to the original column dtype where the\n # row's inferred dtype would otherwise force concat to widen the\n # whole column. This avoids an O(N) materialize-and-rebuild\n # roundtrip in _post_expansion_casting, and (for EA dtypes that\n # carry array-level state not encoded in the dtype, e.g. geopandas\n # CRS) preserves that state through concat. GH#65094.\n orig_dtypes = self._mgr.get_dtypes()\n row_dtypes = row_df._mgr.get_dtypes()\n object_dtype = np.dtype(object)\n for i in range(len(self.columns)):\n orig_dtype = orig_dtypes[i]\n if row_dtypes[i] == orig_dtype:\n continue\n if orig_dtype == object_dtype:\n # concat object + anything stays object; post-cast is a\n # no-op, so pre-casting would only add overhead.\n continue\n arr = self._get_column_array(i)\n if isinstance(arr, np.ndarray):\n # infer_and_maybe_downcast expects an EA as its first\n # argument so it can dispatch to _cast_pointwise_result.\n arr = NumpyExtensionArray(arr)\n casted = infer_and_maybe_downcast(arr, row_df._mgr.iget_values(i))\n row_df.isetitem(i, casted)\n\n from pandas.core.reshape.concat import concat\n\n result = concat(\n [self, row_df],\n ignore_index=ignore_index,\n )\n return result.__finalize__(self, method=\"append\")\n\n def join(\n self,\n other: DataFrame | Series | Iterable[DataFrame | Series],\n on: IndexLabel | None = None,\n how: MergeHow = \"left\",\n lsuffix: str = \"\",\n rsuffix: str = \"\",\n sort: bool = False,\n validate: JoinValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Join columns of another DataFrame.\n\n Join columns with `other` DataFrame either on index or on a key\n column. Efficiently join multiple DataFrame objects by index at once by\n passing a list.\n\n Parameters\n ----------\n other : DataFrame, Series, or a list containing any combination of them\n Index should be similar to one of the columns in the caller. If a\n Series is passed, its name attribute must be set, and that will be\n used as the column name in the resulting joined DataFrame.\n on : str, list of str, or array-like, optional\n Column or index level name(s) in the caller to join on the index\n in `other`, otherwise joins index-on-index. If multiple\n values given, the `other` DataFrame must have a MultiIndex. Can\n pass an array as the join key if it is not already contained in\n the calling DataFrame. Like an Excel VLOOKUP operation.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'left'\n How to handle the operation of the two objects.\n\n * left: use calling frame's index (or column if on is specified)\n * right: use `other`'s index.\n * outer: form union of calling frame's index (or column if on is\n specified) with `other`'s index, and sort it lexicographically.\n * inner: form intersection of calling frame's index (or column if\n on is specified) with `other`'s index, preserving the order\n of the calling's one.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use set difference of calling frame's index and `other`'s\n index.\n * right_anti: use set difference of `other`'s index and calling frame's\n index.\n lsuffix : str, default ''\n Suffix to use from left frame's overlapping columns.\n rsuffix : str, default ''\n Suffix to use from right frame's overlapping columns.\n sort : bool, default False\n Order result DataFrame lexicographically by the join key. If False,\n the order of the join key depends on the join type (how keyword).\n validate : str, optional\n If specified, checks if join is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if join keys are unique in both left\n and right datasets.\n * \"one_to_many\" or \"1:m\": check if join keys are unique in left dataset.\n * \"many_to_one\" or \"m:1\": check if join keys are unique in right dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A dataframe containing columns from both the caller and `other`.\n\n See Also\n --------\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n Parameters `on`, `lsuffix`, and `rsuffix` are not supported when\n passing a list of `DataFrame` objects.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K2\", \"K3\", \"K4\", \"K5\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K2 A2\n 3 K3 A3\n 4 K4 A4\n 5 K5 A5\n\n >>> other = pd.DataFrame({\"key\": [\"K0\", \"K1\", \"K2\"], \"B\": [\"B0\", \"B1\", \"B2\"]})\n\n >>> other\n key B\n 0 K0 B0\n 1 K1 B1\n 2 K2 B2\n\n Join DataFrames using their indexes.\n\n >>> df.join(other, lsuffix=\"_caller\", rsuffix=\"_other\")\n key_caller A key_other B\n 0 K0 A0 K0 B0\n 1 K1 A1 K1 B1\n 2 K2 A2 K2 B2\n 3 K3 A3 NaN NaN\n 4 K4 A4 NaN NaN\n 5 K5 A5 NaN NaN\n\n If we want to join using the key columns, we need to set key to be\n the index in both `df` and `other`. The joined DataFrame will have\n key as its index.\n\n >>> df.set_index(\"key\").join(other.set_index(\"key\"))\n A B\n key\n K0 A0 B0\n K1 A1 B1\n K2 A2 B2\n K3 A3 NaN\n K4 A4 NaN\n K5 A5 NaN\n\n Another option to join using the key columns is to use the `on`\n parameter. DataFrame.join always uses `other`'s index but we can use\n any column in `df`. This method preserves the original DataFrame's\n index in the result.\n\n >>> df.join(other.set_index(\"key\"), on=\"key\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K2 A2 B2\n 3 K3 A3 NaN\n 4 K4 A4 NaN\n 5 K5 A5 NaN\n\n Using non-unique key values shows how they are matched.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K1\", \"K3\", \"K0\", \"K1\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K1 A2\n 3 K3 A3\n 4 K0 A4\n 5 K1 A5\n\n >>> df.join(other.set_index(\"key\"), on=\"key\", validate=\"m:1\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K1 A2 B1\n 3 K3 A3 NaN\n 4 K0 A4 B0\n 5 K1 A5 B1\n \"\"\"\n from pandas.core.reshape.concat import concat\n from pandas.core.reshape.merge import merge\n\n if isinstance(other, Series):\n if other.name is None:\n raise ValueError(\"Other Series must have a name\")\n other = DataFrame({other.name: other})\n\n if isinstance(other, DataFrame):\n if how == \"cross\":\n return merge(\n self,\n other,\n how=how,\n on=on,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n return merge(\n self,\n other,\n left_on=on,\n how=how,\n left_index=on is None,\n right_index=True,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n else:\n if on is not None:\n raise ValueError(\n \"Joining multiple DataFrames only supported for joining on index\"\n )\n\n if rsuffix or lsuffix:\n raise ValueError(\n \"Suffixes not supported when joining multiple DataFrames\"\n )\n\n # Mypy thinks the RHS is a\n # \"Union[DataFrame, Series, Iterable[Union[DataFrame, Series]]]\" whereas\n # the LHS is an \"Iterable[DataFrame]\", but in reality both types are\n # \"Iterable[Union[DataFrame, Series]]\" due to the if statements\n frames = [cast(\"DataFrame | Series\", self), *list(other)]\n\n can_concat = all(df.index.is_unique for df in frames)\n\n # join indexes only using concat\n if can_concat:\n if how in {\"left\", \"right\"}:\n res = concat(\n frames, axis=1, join=\"outer\", verify_integrity=True, sort=sort\n )\n index = self.index if how == \"left\" else frames[-1].index\n if sort:\n index = index.sort_values()\n result = res.reindex(index)\n return result\n else:\n if how == \"outer\":\n sort = True\n return concat(\n frames, axis=1, join=how, verify_integrity=True, sort=sort\n )\n\n joined = frames[0]\n\n for frame in frames[1:]:\n joined = merge(\n joined,\n frame,\n sort=sort,\n how=how,\n left_index=True,\n right_index=True,\n validate=validate,\n )\n\n return joined\n\n def merge(\n self,\n right: DataFrame | Series,\n how: MergeHow = \"inner\",\n on: IndexLabel | AnyArrayLike | None = None,\n left_on: IndexLabel | AnyArrayLike | None = None,\n right_on: IndexLabel | AnyArrayLike | None = None,\n left_index: bool = False,\n right_index: bool = False,\n sort: bool = False,\n suffixes: Suffixes = (\"_x\", \"_y\"),\n copy: bool | lib.NoDefault = lib.no_default,\n indicator: str | bool = False,\n validate: MergeValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Merge DataFrame or named Series objects with a database-style join.\n\n A named Series object is treated as a DataFrame with a single named column.\n\n The join is done on columns or indexes. If joining columns on\n columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes\n on indexes or indexes on a column or columns, the index will be passed on.\n When performing a cross merge, no column specifications to merge on are\n allowed.\n\n .. warning::\n\n If both key columns contain rows where the key is a null value, those\n rows will be matched against each other. This is different from usual SQL\n join behaviour and can lead to unexpected results.\n\n Parameters\n ----------\n right : DataFrame or named Series\n Object to merge with.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'inner'\n Type of merge to be performed.\n\n * left: use only keys from left frame, similar to a SQL left outer join;\n preserve key order.\n * right: use only keys from right frame, similar to a SQL right outer join;\n preserve key order.\n * outer: use union of keys from both frames, similar to a SQL full outer\n join; sort keys lexicographically.\n * inner: use intersection of keys from both frames, similar to a SQL inner\n join; preserve the order of the left keys.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use only keys from left frame that are not in right frame,\n similar to SQL left anti join; preserve key order.\n\n .. versionadded:: 3.0\n * right_anti: use only keys from right frame that are not in left frame,\n similar to SQL right anti join; preserve key order.\n\n .. versionadded:: 3.0\n on : Hashable or a sequence of the previous\n Column or index level names to join on. These must be found in both\n DataFrames. If `on` is None and not merging on indexes then this defaults\n to the intersection of the columns in both DataFrames.\n left_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the left DataFrame. Can also\n be an array or list of arrays of the length of the left DataFrame.\n These arrays are treated as if they are columns.\n right_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the right DataFrame. Can also\n be an array or list of arrays of the length of the right DataFrame.\n These arrays are treated as if they are columns.\n left_index : bool, default False\n Use the index from the left DataFrame as the join key(s). If it is a\n MultiIndex, the number of keys in the other DataFrame (either the index\n or a number of columns) must match the number of levels.\n right_index : bool, default False\n Use the index from the right DataFrame as the join key. Same caveats as\n left_index.\n sort : bool, default False\n Sort the join keys lexicographically in the result DataFrame. If False,\n the order of the join keys depends on the join type (how keyword).\n suffixes : list-like, default is (\"_x\", \"_y\")\n A length-2 sequence where each element is optionally a string\n indicating the suffix to add to overlapping column names in\n `left` and `right` respectively. Pass a value of `None` instead\n of a string to indicate that the column name from `left` or\n `right` should be left as-is, with no suffix. At least one of the\n values must not be None.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n indicator : bool or str, default False\n If True, adds a column to the output DataFrame called \"_merge\" with\n information on the source of each row. The column can be given a different\n name by providing a string argument. The column will have a Categorical\n type with the value of \"left_only\" for observations whose merge key only\n appears in the left DataFrame, \"right_only\" for observations\n whose merge key only appears in the right DataFrame, and \"both\"\n if the observation's merge key is found in both DataFrames.\n\n validate : str, optional\n If specified, checks if merge is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if merge keys are unique in both\n left and right datasets.\n * \"one_to_many\" or \"1:m\": check if merge keys are unique in left\n dataset.\n * \"many_to_one\" or \"m:1\": check if merge keys are unique in right\n dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A DataFrame of the two merged objects.\n\n See Also\n --------\n merge_ordered : Merge with optional filling/interpolation.\n merge_asof : Merge on nearest keys.\n DataFrame.join : Similar method using indices.\n\n Examples\n --------\n >>> df1 = pd.DataFrame(\n ... {\"lkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [1, 2, 3, 5]}\n ... )\n >>> df2 = pd.DataFrame(\n ... {\"rkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [5, 6, 7, 8]}\n ... )\n >>> df1\n lkey value\n 0 foo 1\n 1 bar 2\n 2 baz 3\n 3 foo 5\n >>> df2\n rkey value\n 0 foo 5\n 1 bar 6\n 2 baz 7\n 3 foo 8\n\n Merge df1 and df2 on the lkey and rkey columns. The value columns have\n the default suffixes, _x and _y, appended.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\")\n lkey value_x rkey value_y\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2 with specified left and right suffixes\n appended to any overlapping columns.\n\n >>> df1.merge(\n ... df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(\"_left\", \"_right\")\n ... )\n lkey value_left rkey value_right\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2, but raise an exception if the DataFrames have\n any overlapping columns.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(False, False))\n Traceback (most recent call last):\n ...\n ValueError: columns overlap but no suffix specified:\n Index(['value'], dtype='object')\n\n >>> df1 = pd.DataFrame({\"a\": [\"foo\", \"bar\"], \"b\": [1, 2]})\n >>> df2 = pd.DataFrame({\"a\": [\"foo\", \"baz\"], \"c\": [3, 4]})\n >>> df1\n a b\n 0 foo 1\n 1 bar 2\n >>> df2\n a c\n 0 foo 3\n 1 baz 4\n\n >>> df1.merge(df2, how=\"inner\", on=\"a\")\n a b c\n 0 foo 1 3\n\n >>> df1.merge(df2, how=\"left\", on=\"a\")\n a b c\n 0 foo 1 3.0\n 1 bar 2 NaN\n\n >>> df1 = pd.DataFrame({\"left\": [\"foo\", \"bar\"]})\n >>> df2 = pd.DataFrame({\"right\": [7, 8]})\n >>> df1\n left\n 0 foo\n 1 bar\n >>> df2\n right\n 0 7\n 1 8\n\n >>> df1.merge(df2, how=\"cross\")\n left right\n 0 foo 7\n 1 foo 8\n 2 bar 7\n 3 bar 8\n \"\"\"\n self._check_copy_deprecation(copy)\n\n from pandas.core.reshape.merge import merge\n\n return merge(\n self,\n right,\n how=how,\n on=on,\n left_on=left_on,\n right_on=right_on,\n left_index=left_index,\n right_index=right_index,\n sort=sort,\n suffixes=suffixes,\n indicator=indicator,\n validate=validate,\n )\n\n def round(\n self, decimals: int | dict[IndexLabel, int] | Series = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Round numeric columns in a DataFrame to a variable number of decimal places.\n\n Each column can be rounded to a different number of decimal places by\n passing a dict or Series mapping column names to the desired precision.\n Non-numeric columns are left unchanged.\n\n Parameters\n ----------\n decimals : int, dict, Series\n Number of decimal places to round each column to. If an int is\n given, round each column to the same number of places.\n Otherwise dict and Series round to variable numbers of places.\n Column names should be in the keys if `decimals` is a\n dict-like, or in the index if `decimals` is a Series. Any\n columns not included in `decimals` will be left as is. Elements\n of `decimals` which are not columns of the input will be\n ignored.\n *args\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n\n Returns\n -------\n DataFrame\n A DataFrame with the affected columns rounded to the specified\n number of decimal places.\n\n See Also\n --------\n numpy.around : Round a numpy array to the given number of decimals.\n Series.round : Round a Series to the given number of decimals.\n\n Notes\n -----\n For values exactly halfway between rounded decimal values, pandas rounds\n to the nearest even value (e.g. -0.5 and 0.5 round to 0.0, 1.5 and 2.5\n round to 2.0, etc.).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(0.21, 0.32), (0.01, 0.67), (0.66, 0.03), (0.21, 0.18)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df\n dogs cats\n 0 0.21 0.32\n 1 0.01 0.67\n 2 0.66 0.03\n 3 0.21 0.18\n\n By providing an integer each column is rounded to the same number\n of decimal places\n\n >>> df.round(1)\n dogs cats\n 0 0.2 0.3\n 1 0.0 0.7\n 2 0.7 0.0\n 3 0.2 0.2\n\n With a dict, the number of places for specific columns can be\n specified with the column names as key and the number of decimal\n places as value\n\n >>> df.round({\"dogs\": 1, \"cats\": 0})\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n\n Using a Series, the number of places for specific columns can be\n specified with the column names as index and the number of\n decimal places as value\n\n >>> decimals = pd.Series([0, 1], index=[\"cats\", \"dogs\"])\n >>> df.round(decimals)\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n \"\"\"\n from pandas.core.reshape.concat import concat\n\n def _dict_round(df: DataFrame, decimals) -> Iterator[Series]:\n for col, vals in df.items():\n try:\n yield _series_round(vals, decimals[col])\n except KeyError:\n yield vals\n\n def _series_round(ser: Series, decimals: int) -> Series:\n if is_integer_dtype(ser.dtype) or is_float_dtype(ser.dtype):\n return ser.round(decimals)\n elif isinstance(ser._values, (DatetimeArray, TimedeltaArray, PeriodArray)):\n # GH#57781\n # TODO: also the ArrowDtype analogues?\n warnings.warn(\n \"obj.round has no effect with datetime, timedelta, \"\n \"or period dtypes. Use obj.dt.round(...) instead.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n return ser\n\n nv.validate_round(args, kwargs)\n\n if isinstance(decimals, (dict, Series)):\n if isinstance(decimals, Series) and not decimals.index.is_unique:\n raise ValueError(\"Index of decimals must be unique\")\n if is_dict_like(decimals) and not all(\n is_integer(value) for _, value in decimals.items()\n ):\n raise TypeError(\"Values in decimals must be integers\")\n new_cols = list(_dict_round(self, decimals))\n elif is_integer(decimals):\n # Dispatch to Block.round\n # Argument \"decimals\" to \"round\" of \"BaseBlockManager\" has incompatible\n # type \"Union[int, integer[Any]]\"; expected \"int\"\n new_mgr = self._mgr.round(\n decimals=decimals, # type: ignore[arg-type]\n )\n return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(\n self, method=\"round\"\n )\n else:\n raise TypeError(\"decimals must be an integer, a dict-like or a Series\")\n\n if new_cols is not None and len(new_cols) > 0:\n return self._constructor(\n concat(new_cols, axis=1), index=self.index, columns=self.columns\n ).__finalize__(self, method=\"round\")\n else:\n return self.copy(deep=False)\n\n # ----------------------------------------------------------------------\n # Statistical methods, etc.\n\n def describe(\n self,\n percentiles=None,\n include=None,\n exclude=None,\n ) -> DataFrame:\n \"\"\"\n Generate descriptive statistics.\n\n Summarize the central tendency, dispersion, and shape of each\n analyzed column's distribution, excluding ``NaN`` values. By\n default only numeric columns are analyzed; pass ``include`` to\n also analyze non-numeric columns (or ``exclude`` to omit columns\n by dtype).\n\n Parameters\n ----------\n percentiles : list-like of numbers, optional\n The percentiles to include in the output. All should fall\n between 0 and 1. The default, ``None``, returns the 25th,\n 50th, and 75th percentiles.\n include : 'all', list-like of dtypes or None (default), optional\n Which column dtypes to include. Options:\n\n - ``'all'`` : Include all columns, including non-numeric ones.\n - list-like of dtypes : Limit the result to columns of the\n given dtypes, in the style of\n :meth:`DataFrame.select_dtypes` (e.g. ``include=[np.number]``\n or ``include=[\"category\"]``).\n - ``None`` (default) : Include only numeric columns, falling\n back to object and categorical columns if there are no\n numeric columns.\n exclude : list-like of dtypes or None (default), optional\n Column dtypes to omit from the result, in the style of\n :meth:`DataFrame.select_dtypes`. ``None`` (default) excludes\n nothing.\n\n Returns\n -------\n DataFrame\n Summary statistics of the DataFrame's columns.\n\n See Also\n --------\n Series.describe : Generate descriptive statistics of a Series.\n DataFrame.count : Count of non-NA observations per column.\n DataFrame.max : Maximum of the values in each column.\n DataFrame.min : Minimum of the values in each column.\n DataFrame.mean : Mean of the values.\n DataFrame.std : Standard deviation of the observations.\n DataFrame.select_dtypes : Subset of a DataFrame including/excluding\n columns based on their dtype.\n\n Notes\n -----\n For numeric columns, the result's index includes ``count``,\n ``mean``, ``std``, ``min``, ``max``, and the requested\n percentiles. By default the lower percentile is ``25`` and the\n upper is ``75``; the ``50`` percentile is the same as the median.\n\n For object columns, the result's index includes ``count``,\n ``unique``, ``top``, and ``freq``. The ``top`` is the most common\n value and ``freq`` is its count. If multiple values tie for the\n highest count, ``top`` is chosen arbitrarily from among them.\n\n With ``include='all'``, the result's index is the union of the\n per-dtype indices, with ``NaN`` for statistics that do not apply\n to a given column's dtype.\n\n Examples\n --------\n By default, only numeric columns are analyzed.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"categorical\": pd.Categorical([\"d\", \"e\", \"f\"]),\n ... \"numeric\": [1, 2, 3],\n ... \"object\": [\"a\", \"b\", \"c\"],\n ... }\n ... )\n >>> df.describe()\n numeric\n count 3.0\n mean 2.0\n std 1.0\n min 1.0\n 25% 1.5\n 50% 2.0\n 75% 2.5\n max 3.0\n\n All columns regardless of dtype.\n\n >>> df.describe(include=\"all\") # doctest: +SKIP\n categorical numeric object\n count 3 3.0 3\n unique 3 NaN 3\n top f NaN a\n freq 1 NaN 1\n mean NaN 2.0 NaN\n std NaN 1.0 NaN\n min NaN 1.0 NaN\n 25% NaN 1.5 NaN\n 50% NaN 2.0 NaN\n 75% NaN 2.5 NaN\n max NaN 3.0 NaN\n\n Restrict the result to a specific dtype.\n\n >>> df.describe(include=[\"category\"])\n categorical\n count 3\n unique 3\n top d\n freq 1\n\n Exclude a specific dtype.\n\n >>> df.describe(exclude=[np.number]) # doctest: +SKIP\n categorical object\n count 3 3\n unique 3 3\n top f a\n freq 1 1\n \"\"\"\n return super().describe(\n percentiles=percentiles, include=include, exclude=exclude\n )\n\n def corr(\n self,\n method: CorrelationMethod = \"pearson\",\n min_periods: int = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise correlation of columns, excluding NA/null values.\n\n The result is a symmetric DataFrame where each element represents\n the correlation coefficient between two columns. By default, the\n Pearson correlation is computed, but Kendall and Spearman methods\n as well as arbitrary callables are also supported.\n\n Parameters\n ----------\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float. Note that the returned matrix from corr\n will have 1 along the diagonals and will be symmetric\n regardless of the callable's behavior.\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n Correlation matrix.\n\n See Also\n --------\n DataFrame.corrwith : Compute pairwise correlation with another\n DataFrame or Series.\n Series.corr : Compute the correlation between two Series.\n\n Notes\n -----\n Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.\n\n * `Pearson correlation coefficient `_\n * `Kendall rank correlation coefficient `_\n * `Spearman's rank correlation coefficient `_\n\n Examples\n --------\n >>> def histogram_intersection(a, b):\n ... v = np.minimum(a, b).sum().round(decimals=1)\n ... return v\n >>> df = pd.DataFrame(\n ... [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df.corr(method=histogram_intersection)\n dogs cats\n dogs 1.0 0.3\n cats 0.3 1.0\n\n >>> df = pd.DataFrame(\n ... [(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.corr(min_periods=3)\n dogs cats\n dogs 1.0 NaN\n cats NaN 1.0\n \"\"\" # noqa: E501\n data = self._get_numeric_data() if numeric_only else self\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if method == \"pearson\":\n correl = libalgos.nancorr(mat, minp=min_periods)\n elif method == \"spearman\":\n correl = libalgos.nancorr_spearman(mat, minp=min_periods)\n elif method == \"kendall\" or callable(method):\n if min_periods is None:\n min_periods = 1\n mat = mat.T\n corrf = nanops.get_corr_func(method)\n K = len(cols)\n correl = np.empty((K, K), dtype=float)\n mask = np.isfinite(mat)\n for i, ac in enumerate(mat):\n for j, bc in enumerate(mat):\n if i > j:\n continue\n\n valid = mask[i] & mask[j]\n if valid.sum() < min_periods:\n c = np.nan\n elif i == j:\n c = 1.0\n elif not valid.all():\n c = corrf(ac[valid], bc[valid])\n else:\n c = corrf(ac, bc)\n correl[i, j] = c\n correl[j, i] = c\n else:\n raise ValueError(\n \"method must be either 'pearson', \"\n \"'spearman', 'kendall', or a callable, \"\n f\"'{method}' was supplied\"\n )\n\n result = self._constructor(correl, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"corr\")\n\n def cov(\n self,\n min_periods: int | None = None,\n ddof: int | None = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise covariance of columns, excluding NA/null values.\n\n Compute the pairwise covariance among the series of a DataFrame.\n The returned data frame is the `covariance matrix\n `__ of the columns\n of the DataFrame.\n\n Both NA and null values are automatically excluded from the\n calculation. (See the note below about bias from missing values.)\n A threshold can be set for the minimum number of\n observations for each value created. Comparisons with observations\n below this threshold will be returned as ``NaN``.\n\n This method is generally used for the analysis of time series data to\n understand the relationship between different measures\n across time.\n\n Parameters\n ----------\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result.\n\n ddof : int, default 1\n Delta degrees of freedom. The divisor used in calculations\n is ``N - ddof``, where ``N`` represents the number of elements.\n This argument is applicable only when no ``nan`` is in the dataframe.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n The covariance matrix of the series of the DataFrame.\n\n See Also\n --------\n Series.cov : Compute covariance with another Series.\n core.window.ewm.ExponentialMovingWindow.cov : Exponential weighted sample\n covariance.\n core.window.expanding.Expanding.cov : Expanding sample covariance.\n core.window.rolling.Rolling.cov : Rolling sample covariance.\n\n Notes\n -----\n Returns the covariance matrix of the DataFrame's time series.\n The covariance is normalized by N-ddof.\n\n For DataFrames that have Series that are missing data (assuming that\n data is `missing at random\n `__)\n the returned covariance matrix will be an unbiased estimate\n of the variance and covariance between the member Series.\n\n However, for many applications this estimate may not be acceptable\n because the estimate covariance matrix is not guaranteed to be positive\n semi-definite. This could lead to estimate correlations having\n absolute values which are greater than one, and/or a non-invertible\n covariance matrix. See `Estimation of covariance matrices\n `__ for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(1, 2), (0, 3), (2, 0), (1, 1)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.cov()\n dogs cats\n dogs 0.666667 -1.000000\n cats -1.000000 1.666667\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(\n ... np.random.randn(1000, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"]\n ... )\n >>> df.cov()\n a b c d e\n a 0.998438 -0.020161 0.059277 -0.008943 0.014144\n b -0.020161 1.059352 -0.008543 -0.024738 0.009826\n c 0.059277 -0.008543 1.010670 -0.001486 -0.000271\n d -0.008943 -0.024738 -0.001486 0.921297 -0.013692\n e 0.014144 0.009826 -0.000271 -0.013692 0.977795\n\n **Minimum number of periods**\n\n This method also supports an optional ``min_periods`` keyword\n that specifies the required minimum number of non-NA observations for\n each column pair in order to have a valid result:\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(np.random.randn(20, 3), columns=[\"a\", \"b\", \"c\"])\n >>> df.loc[df.index[:5], \"a\"] = np.nan\n >>> df.loc[df.index[5:10], \"b\"] = np.nan\n >>> df.cov(min_periods=12)\n a b c\n a 0.316741 NaN -0.150812\n b NaN 1.248003 0.191417\n c -0.150812 0.191417 0.895202\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n if any(blk.dtype.kind in \"mM\" for blk in self._mgr.blocks):\n msg = (\n \"DataFrame contains columns with dtype datetime64 \"\n \"or timedelta64, which are not supported for cov.\"\n )\n raise TypeError(msg)\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if notna(mat).all():\n if min_periods is not None and min_periods > len(mat):\n base_cov = np.empty((mat.shape[1], mat.shape[1]))\n base_cov.fill(np.nan)\n else:\n base_cov = np.cov(mat.T, ddof=ddof)\n base_cov = base_cov.reshape((len(cols), len(cols)))\n else:\n base_cov = libalgos.nancorr(mat, cov=True, minp=min_periods)\n\n result = self._constructor(base_cov, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"cov\")\n\n def corrwith(\n self,\n other: DataFrame | Series,\n axis: Axis = 0,\n drop: bool = False,\n method: CorrelationMethod = \"pearson\",\n numeric_only: bool = False,\n min_periods: int | None = None,\n ) -> Series:\n \"\"\"\n Compute pairwise correlation.\n\n Pairwise correlation is computed between rows or columns of\n DataFrame with rows or columns of Series or DataFrame. DataFrames\n are first aligned along both axes before computing the\n correlations.\n\n Parameters\n ----------\n other : DataFrame, Series\n Object with which to compute correlations.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' to compute row-wise, 1 or 'columns' for\n column-wise.\n drop : bool, default False\n Drop missing indices from result.\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n min_periods : int, optional\n Minimum number of observations needed to have a valid result.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n Series\n Pairwise correlations.\n\n See Also\n --------\n DataFrame.corr : Compute pairwise correlation of columns.\n\n Examples\n --------\n >>> index = [\"a\", \"b\", \"c\", \"d\", \"e\"]\n >>> columns = [\"one\", \"two\", \"three\", \"four\"]\n >>> df1 = pd.DataFrame(\n ... np.arange(20).reshape(5, 4), index=index, columns=columns\n ... )\n >>> df2 = pd.DataFrame(\n ... np.arange(16).reshape(4, 4), index=index[:4], columns=columns\n ... )\n >>> df1.corrwith(df2)\n one 1.0\n two 1.0\n three 1.0\n four 1.0\n dtype: float64\n\n >>> df2.corrwith(df1, axis=1)\n a 1.0\n b 1.0\n c 1.0\n d 1.0\n e NaN\n dtype: float64\n \"\"\"\n axis = self._get_axis_number(axis)\n this = self._get_numeric_data() if numeric_only else self\n\n if isinstance(other, Series):\n return this.apply(\n lambda x: other.corr(x, method=method, min_periods=min_periods),\n axis=axis,\n )\n\n if numeric_only:\n other = other._get_numeric_data()\n left, right = this.align(other, join=\"inner\")\n\n if axis == 1:\n left = left.T\n right = right.T\n\n if method == \"pearson\":\n # mask missing values\n left = left + right * 0\n right = right + left * 0\n\n # demeaned data\n ldem = left - left.mean(numeric_only=numeric_only)\n rdem = right - right.mean(numeric_only=numeric_only)\n\n num = (ldem * rdem).sum()\n dom = (\n (left.count() - 1)\n * left.std(numeric_only=numeric_only)\n * right.std(numeric_only=numeric_only)\n )\n\n correl = num / dom\n\n elif method in [\"kendall\", \"spearman\"] or callable(method):\n\n def c(x):\n return nanops.nancorr(x[0], x[1], method=method)\n\n correl = self._constructor_sliced(\n map(c, zip(left.values.T, right.values.T, strict=True)),\n index=left.columns,\n copy=False,\n )\n\n else:\n raise ValueError(\n f\"Invalid method {method} was passed, \"\n \"valid methods are: 'pearson', 'kendall', \"\n \"'spearman', or callable\"\n )\n\n if not drop:\n # Find non-matching labels along the given axis\n # and append missing correlations (GH 22375)\n raxis: AxisInt = 1 if axis == 0 else 0\n result_index = this._get_axis(raxis).union(other._get_axis(raxis))\n idx_diff = result_index.difference(correl.index)\n\n if len(idx_diff) > 0:\n correl = correl._append_internal(\n Series([np.nan] * len(idx_diff), index=idx_diff)\n )\n\n return correl\n\n # ----------------------------------------------------------------------\n # ndarray-like stats methods\n\n def count(self, axis: Axis = 0, numeric_only: bool = False) -> Series:\n \"\"\"\n Count non-NA cells for each column or row.\n\n The values `None`, `NaN`, `NaT`, ``pandas.NA`` are considered NA.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index' counts are generated for each column.\n If 1 or 'columns' counts are generated for each row.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n For each column/row the number of non-NA/null entries.\n\n See Also\n --------\n Series.count: Number of non-NA elements in a Series.\n DataFrame.value_counts: Count unique combinations of columns.\n DataFrame.shape: Number of DataFrame rows and columns (including NA\n elements).\n DataFrame.isna: Boolean same-sized DataFrame showing places of NA\n elements.\n\n Examples\n --------\n Constructing DataFrame from a dictionary:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Person\": [\"John\", \"Myla\", \"Lewis\", \"John\", \"Myla\"],\n ... \"Age\": [24.0, np.nan, 21.0, 33, 26],\n ... \"Single\": [False, True, True, True, False],\n ... }\n ... )\n >>> df\n Person Age Single\n 0 John 24.0 False\n 1 Myla NaN True\n 2 Lewis 21.0 True\n 3 John 33.0 True\n 4 Myla 26.0 False\n\n Notice the uncounted NA values:\n\n >>> df.count()\n Person 5\n Age 4\n Single 5\n dtype: int64\n\n Counts for each **row**:\n\n >>> df.count(axis=\"columns\")\n 0 3\n 1 2\n 2 3\n 3 3\n 4 3\n dtype: int64\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if numeric_only:\n frame = self._get_numeric_data()\n else:\n frame = self\n\n # GH #423\n if len(frame._get_axis(axis)) == 0:\n result = self._constructor_sliced(0, index=frame._get_agg_axis(axis))\n else:\n result = notna(frame).sum(axis=axis)\n\n return result.astype(\"int64\").__finalize__(self, method=\"count\")\n\n def _reduce(\n self,\n op,\n name: str,\n *,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n filter_type=None,\n **kwds,\n ):\n assert filter_type is None or filter_type == \"bool\", filter_type\n out_dtype = \"bool\" if filter_type == \"bool\" else None\n\n if axis is not None:\n axis = self._get_axis_number(axis)\n\n def func(values: np.ndarray):\n # We only use this in the case that operates on self.values\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def blk_func(values, axis: Axis = 1):\n if isinstance(values, ExtensionArray):\n if not is_1d_only_ea_dtype(values.dtype):\n return values._reduce(name, axis=1, skipna=skipna, **kwds)\n return values._reduce(name, skipna=skipna, keepdims=True, **kwds)\n else:\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def _get_data() -> DataFrame:\n if filter_type is None:\n data = self._get_numeric_data()\n else:\n # GH#25101, GH#24434\n assert filter_type == \"bool\"\n data = self._get_bool_data()\n return data\n\n # Case with EAs see GH#35881\n df = self\n if numeric_only:\n df = _get_data()\n if axis is None:\n dtype = find_common_type([block.values.dtype for block in df._mgr.blocks])\n if isinstance(dtype, ExtensionDtype):\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n return arr._reduce(name, skipna=skipna, keepdims=False, **kwds)\n return maybe_unbox_numpy_scalar(func(df.values))\n elif axis == 1:\n if len(df.index) == 0:\n # Taking a transpose would result in no columns, losing the dtype.\n # In the empty case, reducing along axis 0 or 1 gives the same\n # result dtype, so reduce with axis=0 and ignore values\n result = df._reduce(\n op,\n name,\n axis=0,\n skipna=skipna,\n numeric_only=False,\n filter_type=filter_type,\n **kwds,\n ).iloc[:0]\n result.index = df.index\n return result\n\n if df.shape[1]:\n # GH#51474: block-wise axis=1 reduction avoiding expensive\n # transpose for numpy-backed and 2D EA blocks.\n if (\n name in (\"sum\", \"prod\", \"min\", \"max\", \"any\", \"all\", \"mean\")\n and len(df._mgr.blocks) > 1\n and all(\n (isinstance(bv, np.ndarray) and bv.dtype.kind != \"O\")\n or (\n isinstance(bv, ExtensionArray)\n and bv.ndim == 2\n and name in (\"min\", \"max\")\n and skipna\n )\n for bv in (block.values for block in df._mgr.blocks)\n )\n ):\n return df._reduce_axis1(\n name,\n op,\n skipna=skipna,\n min_count=kwds.get(\"min_count\", 0),\n )\n dtype = find_common_type(\n [block.values.dtype for block in df._mgr.blocks]\n )\n if isinstance(dtype, ExtensionDtype):\n # GH 54341: fastpath for EA-backed axis=1 reductions\n # This flattens the frame into a single 1D array while keeping\n # track of the row and column indices of the original frame. Once\n # flattened, grouping by the row indices and aggregating should\n # be equivalent to transposing the original frame and aggregating\n # with axis=0.\n name = {\"argmax\": \"idxmax\", \"argmin\": \"idxmin\"}.get(name, name)\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n nrows, ncols = df.shape\n row_index = np.tile(np.arange(nrows), ncols)\n col_index = np.repeat(np.arange(ncols), nrows)\n ser = Series(arr, index=col_index, copy=False)\n if name == \"all\":\n # Behavior here appears incorrect; preserving\n # for backwards compatibility for now.\n # See https://github.com/pandas-dev/pandas/issues/57171\n skipna = True\n result = ser.groupby(row_index).agg(name, **kwds, skipna=skipna)\n result.index = df.index\n return result\n\n df = df.T\n\n # After possibly _get_data and transposing, we are now in the\n # simple case where we can use BlockManager.reduce\n res = df._mgr.reduce(blk_func)\n out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]\n out.name = None\n if out_dtype is not None and out.dtype != \"boolean\":\n out = out.astype(out_dtype)\n elif (df._mgr.get_dtypes() == object).any() and name not in [\"any\", \"all\"]:\n out = out.astype(object)\n\n return out\n\n def _reduce_axis1(\n self, name: str, func, skipna: bool, min_count: int = 0\n ) -> Series:\n \"\"\"\n Special case for _reduce to try to avoid a potentially-expensive transpose.\n\n Apply the reduction block-wise along axis=1 and then reduce the resulting\n 1D arrays.\n \"\"\"\n if name == \"all\":\n result = np.ones(len(self), dtype=bool)\n ufunc = np.logical_and\n elif name == \"any\":\n result = np.zeros(len(self), dtype=bool)\n # error: Incompatible types in assignment\n # (expression has type \"_UFunc_Nin2_Nout1[Literal['logical_or'],\n # Literal[20], Literal[False]]\", variable has type\n # \"_UFunc_Nin2_Nout1[Literal['logical_and'], Literal[20],\n # Literal[True]]\")\n ufunc = np.logical_or # type: ignore[assignment]\n elif name in (\"sum\", \"mean\"):\n result = None\n ufunc = np.add # type: ignore[assignment]\n elif name == \"prod\":\n result = None\n ufunc = np.multiply # type: ignore[assignment]\n elif name == \"min\":\n result = None\n ufunc = np.fmin if skipna else np.minimum # type: ignore[assignment]\n elif name == \"max\":\n result = None\n ufunc = np.fmax if skipna else np.maximum # type: ignore[assignment]\n else:\n raise NotImplementedError(name)\n\n for block in self._mgr.blocks:\n vals = block.values\n if name in (\"min\", \"max\"):\n middle = ufunc.reduce(vals, axis=0) # type: ignore[arg-type]\n elif name == \"mean\":\n middle = nanops.nansum(vals, axis=0, skipna=skipna, min_count=0) # type: ignore[arg-type]\n elif name in (\"sum\", \"prod\"):\n # min_count=0 here so each block produces a result;\n # the actual min_count threshold is applied across\n # all blocks after the loop.\n middle = func(vals, axis=0, skipna=skipna, min_count=0)\n else:\n middle = func(vals, axis=0, skipna=skipna)\n if result is None:\n result = middle.copy()\n else:\n result = ufunc(result, middle)\n\n # Handle min_count for sum/prod, and compute mean from sum/count\n if name in (\"sum\", \"prod\", \"mean\"):\n if (min_count > 0 or name == \"mean\") and result is not None:\n non_null_count = np.zeros(len(self), dtype=np.intp)\n for block in self._mgr.blocks:\n vals = block.values\n if vals.dtype.kind in \"biu\":\n # bool/int/uint cannot have NaN\n non_null_count += vals.shape[0]\n else:\n non_null_count += vals.shape[0] - isna(vals).sum(axis=0)\n if name == \"mean\":\n null_mask = non_null_count == 0\n result = result.astype(\"float64\")\n result[~null_mask] /= non_null_count[~null_mask]\n result[null_mask] = np.nan\n else:\n null_mask = non_null_count < min_count\n if null_mask.any():\n if result.dtype.kind not in \"fc\":\n result = result.astype(\"float64\")\n result[null_mask] = np.nan\n\n assert result is not None\n res_ser = self._constructor_sliced(result, index=self.index, copy=False)\n return res_ser\n\n # error: Signature of \"any\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def any(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def any(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def any(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n def any(\n self,\n *,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether any element is True, potentially over an axis.\n\n Returns False unless there is at least one element within a series or\n along a Dataframe axis that is True or equivalent (e.g. non-zero or\n non-empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be False, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n numpy.any : Numpy version of this method.\n Series.any : Return whether any element is True.\n Series.all : Return whether all elements are True.\n DataFrame.any : Return whether any element is True over requested axis.\n DataFrame.all : Return whether all elements are True over requested axis.\n\n Examples\n --------\n **Series**\n\n For Series input, the output is a scalar indicating whether any element\n is True.\n\n >>> pd.Series([False, False]).any()\n False\n >>> pd.Series([True, False]).any()\n True\n >>> pd.Series([], dtype=\"float64\").any()\n False\n >>> pd.Series([np.nan]).any()\n False\n >>> pd.Series([np.nan]).any(skipna=False)\n True\n\n **DataFrame**\n\n Whether each column contains at least one True element (the default).\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0, 2], \"C\": [0, 0]})\n >>> df\n A B C\n 0 1 0 0\n 1 2 2 0\n\n >>> df.any()\n A True\n B True\n C False\n dtype: bool\n\n Aggregating over the columns.\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 2]})\n >>> df\n A B\n 0 True 1\n 1 False 2\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 True\n dtype: bool\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 0]})\n >>> df\n A B\n 0 True 1\n 1 False 0\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Aggregating over the entire DataFrame with ``axis=None``.\n\n >>> df.any(axis=None)\n True\n\n `any` for an empty DataFrame is an empty Series.\n\n >>> pd.DataFrame([]).any()\n Series([], dtype: bool)\n \"\"\"\n result = self._logical_func(\n \"any\", nanops.nanany, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"any\")\n return result\n\n @overload\n def all(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def all(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def all(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"all\")\n def all(\n self,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether all elements are True, potentially over an axis.\n\n Returns True unless there at least one element within a series or\n along a Dataframe axis that is False or equivalent (e.g. zero or\n empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be True, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n Series.all : Return True if all elements are True.\n DataFrame.any : Return True if one (or more) elements are True.\n\n Examples\n --------\n **Series**\n\n >>> pd.Series([True, True]).all()\n True\n >>> pd.Series([True, False]).all()\n False\n >>> pd.Series([], dtype=\"float64\").all()\n True\n >>> pd.Series([np.nan]).all()\n True\n >>> pd.Series([np.nan]).all(skipna=False)\n True\n\n **DataFrames**\n\n Create a DataFrame from a dictionary.\n\n >>> df = pd.DataFrame({\"col1\": [True, True], \"col2\": [True, False]})\n >>> df\n col1 col2\n 0 True True\n 1 True False\n\n Default behaviour checks if values in each column all return True.\n\n >>> df.all()\n col1 True\n col2 False\n dtype: bool\n\n Specify ``axis='columns'`` to check if values in each row all return True.\n\n >>> df.all(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Or ``axis=None`` for whether every value is True.\n\n >>> df.all(axis=None)\n False\n \"\"\"\n result = self._logical_func(\n \"all\", nanops.nanall, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"all\")\n return result\n\n # error: Signature of \"min\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def min(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def min(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def min(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"min\")\n def min(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the minimum of the values over the requested axis.\n\n If you want the *index* of the minimum, use ``idxmin``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmin``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.min()\n 0\n \"\"\"\n result = super().min(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"min\")\n return result\n\n # error: Signature of \"max\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def max(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def max(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def max(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"max\")\n def max(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the maximum of the values over the requested axis.\n\n If you want the *index* of the maximum, use ``idxmax``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmax``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.max()\n 8\n \"\"\"\n result = super().max(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"max\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sum\")\n def sum(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the sum of the values over the requested axis.\n\n This is equivalent to the method ``numpy.sum``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sum with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Sum over requested axis.\n\n See Also\n --------\n Series.sum : Return the sum over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.std : Return the standard deviation of the values over the\n requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.sum()\n 14\n\n By default, the sum of an empty or all-NA Series is ``0``.\n\n >>> pd.Series([], dtype=\"float64\").sum() # min_count=0 is the default\n 0.0\n\n This can be controlled with the ``min_count`` parameter. For example, if\n you'd like the sum of an empty series to be NaN, pass ``min_count=1``.\n\n >>> pd.Series([], dtype=\"float64\").sum(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).sum()\n 0.0\n\n >>> pd.Series([np.nan]).sum(min_count=1)\n nan\n \"\"\"\n result = super().sum(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sum\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"prod\")\n def prod(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the product of the values over the requested axis.\n\n This multiplies all values in each column (or row when\n ``axis=1``) together, skipping missing values by default.\n An empty or all-NA column returns ``1`` unless ``min_count``\n is specified.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.prod with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n The product of the values over the requested axis.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n By default, the product of an empty or all-NA Series is ``1``\n\n >>> pd.Series([], dtype=\"float64\").prod()\n 1.0\n\n This can be controlled with the ``min_count`` parameter\n\n >>> pd.Series([], dtype=\"float64\").prod(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).prod()\n 1.0\n\n >>> pd.Series([np.nan]).prod(min_count=1)\n nan\n \"\"\"\n result = super().prod(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"prod\")\n return result\n\n # error: Signature of \"mean\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def mean(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def mean(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def mean(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"mean\")\n def mean(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the mean of the values over the requested axis.\n\n This computes the arithmetic mean of the values in each column\n (or row when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.mean()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.mean()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.mean(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.mean(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().mean(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"mean\")\n return result\n\n # error: Signature of \"median\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def median(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def median(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def median(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\"], name=\"median\"\n )\n def median(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the median of the values over the requested axis.\n\n This computes the median of the values in each column (or row\n when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.median()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.median()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.median(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.median(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().median(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"median\")\n return result\n\n # error: Signature of \"sem\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sem(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def sem(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def sem(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sem\")\n def sem(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased standard error of the mean over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sem with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series\n Unbiased standard error of the mean over requested axis.\n\n See Also\n --------\n DataFrame.var : Return unbiased variance over requested axis.\n DataFrame.std : Returns sample standard deviation over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> round(s.sem(), 6)\n 0.57735\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.sem()\n a 0.5\n b 0.5\n dtype: float64\n\n Using axis=1\n\n >>> df.sem(axis=1)\n tiger 0.5\n zebra 0.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.sem(numeric_only=True)\n a 0.5\n dtype: float64\n \"\"\"\n result = super().sem(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sem\")\n return result\n\n # error: Signature of \"var\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def var(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def var(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def var(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"var\")\n def var(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased variance over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.var with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series or scalaer\n Unbiased variance over requested axis.\n\n See Also\n --------\n numpy.var : Equivalent function in NumPy.\n Series.var : Return unbiased variance over Series values.\n Series.std : Return standard deviation over Series values.\n DataFrame.std : Return standard deviation of the values over\n the requested axis.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n >>> df.var()\n age 352.916667\n height 0.056367\n dtype: float64\n\n Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:\n\n >>> df.var(ddof=0)\n age 264.687500\n height 0.042275\n dtype: float64\n \"\"\"\n result = super().var(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"var\")\n return result\n\n # error: Signature of \"std\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def std(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def std(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def std(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"std\")\n def std(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return sample standard deviation over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.std with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs : dict\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Standard deviation over requested axis.\n\n See Also\n --------\n Series.std : Return standard deviation over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.sum : Return the sum of the values over the requested axis.\n\n Notes\n -----\n To have the same behaviour as ``numpy.std``, use ``ddof=0`` (instead of\n the default ``ddof=1``) and ``skipna=False``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n The standard deviation of the columns can be found as follows:\n\n >>> df.std()\n age 18.786076\n height 0.237417\n dtype: float64\n\n Alternatively, `ddof=0` can be set to normalize by N instead of N-1:\n\n >>> df.std(ddof=0)\n age 16.269219\n height 0.205609\n dtype: float64\n \"\"\"\n result = super().std(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"std\")\n return result\n\n # error: Signature of \"skew\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def skew(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def skew(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def skew(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"skew\")\n def skew(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased skew over requested axis.\n\n Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased skew over requested axis.\n\n See Also\n --------\n DataFrame.kurt : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.skew()\n 0.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [2, 3, 4], \"c\": [1, 3, 5]},\n ... index=[\"tiger\", \"zebra\", \"cow\"],\n ... )\n >>> df\n a b c\n tiger 1 2 1\n zebra 2 3 3\n cow 3 4 5\n >>> df.skew()\n a 0.0\n b 0.0\n c 0.0\n dtype: float64\n\n Using axis=1\n\n >>> df.skew(axis=1)\n tiger 1.732051\n zebra -1.732051\n cow 0.000000\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [\"T\", \"Z\", \"X\"]}, index=[\"tiger\", \"zebra\", \"cow\"]\n ... )\n >>> df.skew(numeric_only=True)\n a 0.0\n dtype: float64\n \"\"\"\n result = super().skew(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"skew\")\n return result\n\n # error: Signature of \"kurt\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def kurt(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"kurt\")\n def kurt(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased kurtosis over requested axis.\n\n Kurtosis obtained using Fisher's definition of\n kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased kurtosis over requested axis.\n\n See Also\n --------\n DataFrame.kurtosis : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 2, 3], index=[\"cat\", \"dog\", \"dog\", \"mouse\"])\n >>> s\n cat 1\n dog 2\n dog 2\n mouse 3\n dtype: int64\n >>> round(s.kurt(), 6)\n 1.5\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 2, 3], \"b\": [3, 4, 4, 4]},\n ... index=[\"cat\", \"dog\", \"dog\", \"mouse\"],\n ... )\n >>> df\n a b\n cat 1 3\n dog 2 4\n dog 2 4\n mouse 3 4\n >>> round(df.kurt(), 6)\n a 1.5\n b 4.0\n dtype: float64\n\n With axis=None\n\n >>> round(df.kurt(axis=None), 6)\n -0.988693\n\n Using axis=1\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2], \"b\": [3, 4], \"c\": [3, 4], \"d\": [1, 2]},\n ... index=[\"cat\", \"dog\"],\n ... )\n >>> df.kurt(axis=1)\n cat -6.0\n dog -6.0\n dtype: float64\n \"\"\"\n result = super().kurt(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"kurt\")\n return result\n\n # error: Incompatible types in assignment\n kurtosis = kurt # type: ignore[assignment]\n product = prod\n\n def cummin(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative minimum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n minimum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative minimum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.min : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.min : Return the minimum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummin()\n 0 2.0\n 1 NaN\n 2 2.0\n 3 -1.0\n 4 -1.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummin(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the minimum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummin()\n A B\n 0 2.0 1.0\n 1 2.0 NaN\n 2 1.0 0.0\n\n To iterate over columns and find the minimum in each row,\n use ``axis=1``\n\n >>> df.cummin(axis=1)\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummin(data, axis, skipna, *args, **kwargs)\n\n def cummax(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative maximum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n maximum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative maximum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.max : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.max : Return the maximum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummax()\n 0 2.0\n 1 NaN\n 2 5.0\n 3 5.0\n 4 5.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummax(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the maximum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummax()\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 3.0 1.0\n\n To iterate over columns and find the maximum in each row,\n use ``axis=1``\n\n >>> df.cummax(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummax(data, axis, skipna, *args, **kwargs)\n\n def cumsum(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative sum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n sum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative sum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.sum : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.sum : Return the sum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumsum()\n 0 2.0\n 1 NaN\n 2 7.0\n 3 6.0\n 4 6.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumsum(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the sum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumsum()\n A B\n 0 2.0 1.0\n 1 5.0 NaN\n 2 6.0 1.0\n\n To iterate over columns and find the sum in each row,\n use ``axis=1``\n\n >>> df.cumsum(axis=1)\n A B\n 0 2.0 3.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumsum(data, axis, skipna, *args, **kwargs)\n\n def cumprod(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative product over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n product.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative product of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.prod : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.prod : Return the product over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumprod()\n 0 2.0\n 1 NaN\n 2 10.0\n 3 -10.0\n 4 -0.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumprod(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the product\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumprod()\n A B\n 0 2.0 1.0\n 1 6.0 NaN\n 2 6.0 0.0\n\n To iterate over columns and find the product in each row,\n use ``axis=1``\n\n >>> df.cumprod(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumprod(data, axis, skipna, *args, **kwargs)\n\n def nunique(self, axis: Axis = 0, dropna: bool = True) -> Series:\n \"\"\"\n Count number of distinct elements in specified axis.\n\n Return Series with number of distinct elements. Can ignore NaN\n values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for\n column-wise.\n dropna : bool, default True\n Don't include NaN in the counts.\n\n Returns\n -------\n Series\n Series with counts of unique values per row or column, depending on `axis`.\n\n See Also\n --------\n Series.nunique: Method nunique for Series.\n DataFrame.count: Count non-NA cells for each column or row.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [4, 5, 6], \"B\": [4, 1, 1]})\n >>> df.nunique()\n A 3\n B 2\n dtype: int64\n\n >>> df.nunique(axis=1)\n 0 1\n 1 2\n 2 2\n dtype: int64\n \"\"\"\n return self.apply(Series.nunique, axis=axis, dropna=dropna)\n\n def idxmin(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of minimum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of minima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmin : Return index of the minimum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmin``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the minimum value in each column.\n\n >>> df.idxmin()\n consumption Pork\n co2_emissions Wheat Products\n dtype: str\n\n To return the index for the minimum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmin(axis=\"columns\")\n Pork consumption\n Wheat Products co2_emissions\n Beef consumption\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmin, \"argmin\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be np.ndarray since axis is not N\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmin\")\n\n def idxmax(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of maximum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of maxima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmax : Return index of the maximum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmax``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the maximum value in each column.\n\n >>> df.idxmax()\n consumption Wheat Products\n co2_emissions Beef\n dtype: str\n\n To return the index for the maximum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmax(axis=\"columns\")\n Pork co2_emissions\n Wheat Products consumption\n Beef co2_emissions\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmax, \"argmax\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be 1d array since axis is not None\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmax\")\n\n def _get_agg_axis(self, axis_num: int) -> Index:\n \"\"\"\n Let's be explicit about this.\n \"\"\"\n if axis_num == 0:\n return self.columns\n elif axis_num == 1:\n return self.index\n else:\n raise ValueError(f\"Axis must be 0 or 1 (got {axis_num!r})\")\n\n def mode(\n self, axis: Axis = 0, numeric_only: bool = False, dropna: bool = True\n ) -> DataFrame:\n \"\"\"\n Get the mode(s) of each element along the selected axis.\n\n The mode of a set of values is the value that appears most often.\n It can be multiple values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to iterate over while searching for the mode:\n\n * 0 or 'index' : get mode of each column\n * 1 or 'columns' : get mode of each row.\n\n numeric_only : bool, default False\n If True, only apply to numeric columns.\n dropna : bool, default True\n Don't consider counts of NaN/NaT.\n\n Returns\n -------\n DataFrame\n The modes of each column or row.\n\n See Also\n --------\n Series.mode : Return the highest frequency value in a Series.\n Series.value_counts : Return the counts of values in a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"bird\", 2, 2),\n ... (\"mammal\", 4, np.nan),\n ... (\"arthropod\", 8, 0),\n ... (\"bird\", 2, np.nan),\n ... ],\n ... index=(\"falcon\", \"horse\", \"spider\", \"ostrich\"),\n ... columns=(\"species\", \"legs\", \"wings\"),\n ... )\n >>> df\n species legs wings\n falcon bird 2 2.0\n horse mammal 4 NaN\n spider arthropod 8 0.0\n ostrich bird 2 NaN\n\n By default, missing values are not considered, and the mode of wings\n are both 0 and 2. Because the resulting DataFrame has two rows,\n the second row of ``species`` and ``legs`` contains ``NaN``.\n\n >>> df.mode()\n species legs wings\n 0 bird 2.0 0.0\n 1 NaN NaN 2.0\n\n Setting ``dropna=False`` ``NaN`` values are considered and they can be\n the mode (like for wings).\n\n >>> df.mode(dropna=False)\n species legs wings\n 0 bird 2 NaN\n\n Setting ``numeric_only=True``, only the mode of numeric columns is\n computed, and columns of other types are ignored.\n\n >>> df.mode(numeric_only=True)\n legs wings\n 0 2.0 0.0\n 1 NaN 2.0\n\n To compute the mode over columns and not rows, use the axis parameter:\n\n >>> df.mode(axis=\"columns\", numeric_only=True)\n 0 1\n falcon 2.0 NaN\n horse 4.0 NaN\n spider 0.0 8.0\n ostrich 2.0 NaN\n \"\"\"\n data = self if not numeric_only else self._get_numeric_data()\n\n def f(s):\n return s.mode(dropna=dropna)\n\n data = data.apply(f, axis=axis)\n # Ensure index is type stable (should always use int index)\n if data.empty:\n data.index = default_index(0)\n\n return data\n\n @overload\n def quantile(\n self,\n q: float = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series: ...\n\n @overload\n def quantile(\n self,\n q: AnyArrayLike | Sequence[float],\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n @overload\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = 0.5,\n axis: Axis = 0,\n numeric_only: bool = False,\n interpolation: QuantileInterpolation = \"linear\",\n method: Literal[\"single\", \"table\"] = \"single\",\n ) -> Series | DataFrame:\n \"\"\"\n Return values at the given quantile over requested axis.\n\n This method computes the value below which a given proportion of\n observations fall. By default, it computes quantiles column-wise,\n but row-wise computation is also supported via ``axis=1``.\n\n Parameters\n ----------\n q : float or array-like, default 0.5 (50% quantile)\n Value between 0 <= q <= 1, the quantile(s) to compute.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Equals 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}\n This optional parameter specifies the interpolation method to use,\n when the desired quantile lies between two data points `i` and `j`:\n\n * linear: `i + (j - i) * fraction`, where `fraction` is the\n fractional part of the index surrounded by `i` and `j`.\n * lower: `i`.\n * higher: `j`.\n * nearest: `i` or `j` whichever is nearest.\n * midpoint: (`i` + `j`) / 2.\n method : {'single', 'table'}, default 'single'\n Whether to compute quantiles per-column ('single') or over all columns\n ('table'). When 'table', the only allowed interpolation methods are\n 'nearest', 'lower', and 'higher'.\n\n Returns\n -------\n Series or DataFrame\n\n If ``q`` is an array, a DataFrame will be returned where the\n index is ``q``, the columns are the columns of self, and the\n values are the quantiles.\n If ``q`` is a float, a Series will be returned where the\n index is the columns of self and the values are the quantiles.\n\n See Also\n --------\n core.window.rolling.Rolling.quantile: Rolling quantile.\n numpy.percentile: Numpy function to compute the percentile.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=[\"a\", \"b\"]\n ... )\n >>> df.quantile(0.1)\n a 1.3\n b 3.7\n Name: 0.1, dtype: float64\n >>> df.quantile([0.1, 0.5])\n a b\n 0.1 1.3 3.7\n 0.5 2.5 55.0\n\n Specifying `method='table'` will compute the quantile over all columns.\n\n >>> df.quantile(0.1, method=\"table\", interpolation=\"nearest\")\n a 1\n b 1\n Name: 0.1, dtype: int64\n >>> df.quantile([0.1, 0.5], method=\"table\", interpolation=\"nearest\")\n a b\n 0.1 1 1\n 0.5 3 100\n\n Specifying `numeric_only=False` will compute the quantiles for all\n columns.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [1, 2],\n ... \"B\": [pd.Timestamp(\"2010\"), pd.Timestamp(\"2011\")],\n ... \"C\": [pd.Timedelta(\"1 days\"), pd.Timedelta(\"2 days\")],\n ... }\n ... )\n >>> df.quantile(0.5, numeric_only=False)\n A 1.5\n B 2010-07-02 12:00:00\n C 1 days 12:00:00\n Name: 0.5, dtype: object\n \"\"\"\n validate_percentile(q)\n axis = self._get_axis_number(axis)\n\n if not is_list_like(q):\n # BlockManager.quantile expects listlike, so we wrap and unwrap here\n # error: List item 0 has incompatible type \"float | ExtensionArray |\n # ndarray[Any, Any] | Index | Series | Sequence[float]\"; expected \"float\"\n res_df = self.quantile(\n [q], # type: ignore[list-item]\n axis=axis,\n numeric_only=numeric_only,\n interpolation=interpolation,\n method=method,\n )\n if method == \"single\":\n res = res_df.iloc[0]\n else:\n # cannot directly iloc over sparse arrays\n res = res_df.T.iloc[:, 0]\n if axis == 1 and len(self) == 0:\n # GH#41544 try to get an appropriate dtype\n dtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(dtype):\n return res.astype(dtype)\n return res\n\n q = Index(q, dtype=np.float64)\n data = self._get_numeric_data() if numeric_only else self\n\n if axis == 1:\n data = data.T\n\n if len(data.columns) == 0:\n # GH#23925 _get_numeric_data may have dropped all columns\n cols = self.columns[:0]\n\n dtype = np.float64\n if axis == 1:\n # GH#41544 try to get an appropriate dtype\n cdtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(cdtype):\n dtype = cdtype\n\n res = self._constructor([], index=q, columns=cols, dtype=dtype)\n return res.__finalize__(self, method=\"quantile\")\n\n valid_method = {\"single\", \"table\"}\n if method not in valid_method:\n raise ValueError(\n f\"Invalid method: {method}. Method must be in {valid_method}.\"\n )\n if method == \"single\":\n res = data._mgr.quantile(qs=q, interpolation=interpolation)\n elif method == \"table\":\n valid_interpolation = {\"nearest\", \"lower\", \"higher\"}\n if interpolation not in valid_interpolation:\n raise ValueError(\n f\"Invalid interpolation: {interpolation}. \"\n f\"Interpolation must be in {valid_interpolation}\"\n )\n # handle degenerate case\n if len(data) == 0:\n if data.ndim == 2:\n dtype = find_common_type(list(self.dtypes))\n else:\n dtype = self.dtype\n return self._constructor([], index=q, columns=data.columns, dtype=dtype)\n\n q_idx = np.quantile(np.arange(len(data)), q, method=interpolation)\n\n by = data.columns\n if len(by) > 1:\n keys = [data._get_label_or_level_values(x) for x in by]\n indexer = lexsort_indexer(keys)\n else:\n k = data._get_label_or_level_values(by[0])\n indexer = nargsort(k)\n\n res = data._mgr.take(indexer[q_idx], verify=False)\n res.axes[1] = q\n\n result = self._constructor_from_mgr(res, axes=res.axes)\n return result.__finalize__(self, method=\"quantile\")\n\n def to_timestamp(\n self,\n freq: Frequency | None = None,\n how: ToTimestampHow = \"start\",\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Cast PeriodIndex to DatetimeIndex of timestamps, at *beginning* of period.\n\n This can be changed to the *end* of the period, by specifying `how=\"e\"`.\n\n Parameters\n ----------\n freq : str, default frequency of PeriodIndex\n Desired frequency.\n how : {'s', 'e', 'start', 'end'}\n Convention for converting period to timestamp; start of period\n vs. end.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame with DatetimeIndex\n DataFrame with the PeriodIndex cast to DatetimeIndex.\n\n See Also\n --------\n DataFrame.to_period: Inverse method to cast DatetimeIndex to PeriodIndex.\n Series.to_timestamp: Equivalent method for Series.\n\n Examples\n --------\n >>> idx = pd.PeriodIndex([\"2023\", \"2024\"], freq=\"Y\")\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d, index=idx)\n >>> df1\n col1 col2\n 2023 1 3\n 2024\t 2 4\n\n The resulting timestamps will be at the beginning of the year in this case\n\n >>> df1 = df1.to_timestamp()\n >>> df1\n col1 col2\n 2023-01-01 1 3\n 2024-01-01 2 4\n >>> df1.index\n DatetimeIndex(['2023-01-01', '2024-01-01'], dtype='datetime64[us]', freq=None)\n\n Using `freq` which is the offset that the Timestamps will have\n\n >>> df2 = pd.DataFrame(data=d, index=idx)\n >>> df2 = df2.to_timestamp(freq=\"M\")\n >>> df2\n col1 col2\n 2023-01-31 1 3\n 2024-01-31 2 4\n >>> df2.index\n DatetimeIndex(['2023-01-31', '2024-01-31'], dtype='datetime64[us]', freq=None)\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, PeriodIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_timestamp(freq=freq, how=how)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def to_period(\n self,\n freq: Frequency | None = None,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Convert DataFrame from DatetimeIndex to PeriodIndex.\n\n Convert DataFrame from DatetimeIndex to PeriodIndex with desired\n frequency (inferred from index if not passed). Either index of columns can be\n converted, depending on `axis` argument.\n\n Parameters\n ----------\n freq : str, default\n Frequency of the PeriodIndex.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The DataFrame with the converted PeriodIndex.\n\n See Also\n --------\n Series.to_period: Equivalent method for Series.\n Series.dt.to_period: Convert DateTime column values.\n\n Examples\n --------\n >>> idx = pd.to_datetime(\n ... [\n ... \"2001-03-31 00:00:00\",\n ... \"2002-05-31 00:00:00\",\n ... \"2003-08-31 00:00:00\",\n ... ]\n ... )\n\n >>> idx\n DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],\n dtype='datetime64[us]', freq=None)\n\n >>> idx.to_period(\"M\")\n PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')\n\n For the yearly frequency\n\n >>> idx.to_period(\"Y\")\n PeriodIndex(['2001', '2002', '2003'], dtype='period[Y-DEC]')\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, DatetimeIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_period(freq=freq)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def isin(self, values: Series | DataFrame | Sequence | Mapping) -> DataFrame:\n \"\"\"\n Whether each element in the DataFrame is contained in values.\n\n Returns a DataFrame of the same shape with boolean values: True\n where the element is in the corresponding structure of\n ``values``, False otherwise. ``values`` can be a list, dict,\n Series, or DataFrame; alignment rules depend on its type.\n\n Parameters\n ----------\n values : iterable, Series, DataFrame or dict\n The result will only be true at a location if all the\n labels match. If `values` is a Series, that's the index. If\n `values` is a dict, the keys must be the column names,\n which must match. If `values` is a DataFrame,\n then both the index and column labels must match.\n\n Returns\n -------\n DataFrame\n DataFrame of booleans showing whether each element in the DataFrame\n is contained in values.\n\n See Also\n --------\n DataFrame.eq: Equality test for DataFrame.\n Series.isin: Equivalent method on Series.\n Series.str.contains: Test if pattern or regex is contained within a\n string of a Series or Index.\n\n Notes\n -----\n ``__iter__`` is used (and not ``__contains__``) to iterate over values\n when checking if it contains the elements in DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4], \"num_wings\": [2, 0]}, index=[\"falcon\", \"dog\"]\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n\n When ``values`` is a list check whether every value in the DataFrame\n is present in the list (which animals have 0 or 2 legs or wings)\n\n >>> df.isin([0, 2])\n num_legs num_wings\n falcon True True\n dog False True\n\n To check if ``values`` is *not* in the DataFrame, use the ``~`` operator:\n\n >>> ~df.isin([0, 2])\n num_legs num_wings\n falcon False False\n dog True False\n\n When ``values`` is a dict, we can pass values to check for each\n column separately:\n\n >>> df.isin({\"num_wings\": [0, 3]})\n num_legs num_wings\n falcon False False\n dog False True\n\n When ``values`` is a Series or DataFrame the index and column must\n match. Note that 'falcon' does not match based on the number of legs\n in other.\n\n >>> other = pd.DataFrame(\n ... {\"num_legs\": [8, 3], \"num_wings\": [0, 2]}, index=[\"spider\", \"falcon\"]\n ... )\n >>> df.isin(other)\n num_legs num_wings\n falcon False True\n dog False False\n \"\"\"\n if isinstance(values, dict):\n from pandas.core.reshape.concat import concat\n\n values = collections.defaultdict(list, values)\n result = concat(\n (\n self.iloc[:, [i]].isin(values[col])\n for i, col in enumerate(self.columns)\n ),\n axis=1,\n )\n elif isinstance(values, Series):\n if not values.index.is_unique:\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self), axis=\"index\")\n elif isinstance(values, DataFrame):\n if not (values.columns.is_unique and values.index.is_unique):\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self))\n else:\n if not is_list_like(values):\n raise TypeError(\n \"only list-like or dict-like objects are allowed \"\n \"to be passed to DataFrame.isin(), \"\n f\"you passed a '{type(values).__name__}'\"\n )\n\n def isin_(x):\n # error: Argument 2 to \"isin\" has incompatible type \"Union[Series,\n # DataFrame, Sequence[Any], Mapping[Any, Any]]\"; expected\n # \"Union[Union[Union[ExtensionArray, ndarray[Any, Any]], Index,\n # Series], List[Any], range]\"\n result = algorithms.isin(\n x.ravel(),\n values, # type: ignore[arg-type]\n )\n return result.reshape(x.shape)\n\n res_mgr = self._mgr.apply(isin_)\n result = self._constructor_from_mgr(\n res_mgr,\n axes=res_mgr.axes,\n )\n return result.__finalize__(self, method=\"isin\")\n\n # ----------------------------------------------------------------------\n # Add index and columns\n _AXIS_ORDERS: list[Literal[\"index\", \"columns\"]] = [\"index\", \"columns\"]\n _AXIS_TO_AXIS_NUMBER: dict[Axis, int] = {\n **NDFrame._AXIS_TO_AXIS_NUMBER,\n 1: 1,\n \"columns\": 1,\n }\n _AXIS_LEN = len(_AXIS_ORDERS)\n _info_axis_number: Literal[1] = 1\n _info_axis_name: Literal[\"columns\"] = \"columns\"\n\n index = properties.AxisProperty(\n axis=1,\n doc=\"\"\"\n The index (row labels) of the DataFrame.\n\n The index of a DataFrame is a series of labels that identify each row.\n The labels can be integers, strings, or any other hashable type. The index\n is used for label-based access and alignment, and can be accessed or\n modified using this attribute.\n\n Returns\n -------\n pandas.Index\n The index labels of the DataFrame.\n\n See Also\n --------\n DataFrame.columns : The column labels of the DataFrame.\n DataFrame.to_numpy : Convert the DataFrame to a NumPy array.\n\n Examples\n --------\n >>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],\n ... 'Age': [25, 30, 35],\n ... 'Location': ['Seattle', 'New York', 'Kona']},\n ... index=([10, 20, 30]))\n >>> df.index\n Index([10, 20, 30], dtype='int64')\n\n In this example, we create a DataFrame with 3 rows and 3 columns,\n including Name, Age, and Location information. We set the index labels to\n be the integers 10, 20, and 30. We then access the `index` attribute of the\n DataFrame, which returns an `Index` object containing the index labels.\n\n >>> df.index = [100, 200, 300]\n >>> df\n Name Age Location\n 100 Alice 25 Seattle\n 200 Bob 30 New York\n 300 Aritra 35 Kona\n\n In this example, we modify the index labels of the DataFrame by assigning\n a new list of labels to the `index` attribute. The DataFrame is then\n updated with the new labels, and the output shows the modified DataFrame.\n \"\"\",\n )\n columns = properties.AxisProperty(\n axis=0,\n doc=\"\"\"\n The column labels of the DataFrame.\n\n This property holds the column names as a pandas ``Index`` object.\n It provides an immutable sequence of column labels that can be\n used for data selection, renaming, and alignment in DataFrame operations.\n\n Returns\n -------\n pandas.Index\n The column labels of the DataFrame.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.axes: Return a list representing the axes of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})\n >>> df\n A B\n 0 1 3\n 1 2 4\n >>> df.columns\n Index(['A', 'B'], dtype='str')\n \"\"\",\n )\n\n # ----------------------------------------------------------------------\n # Add plotting methods to DataFrame\n plot = Accessor(\"plot\", pandas.plotting.PlotAccessor)\n hist = pandas.plotting.hist_frame\n boxplot = pandas.plotting.boxplot_frame\n sparse = Accessor(\"sparse\", SparseFrameAccessor)\n\n # ----------------------------------------------------------------------\n # Internal Interface Methods\n\n\n @property\n def values(self) -> np.ndarray:\n \"\"\"\n Return a Numpy representation of the DataFrame.\n\n .. warning::\n\n We recommend using :meth:`DataFrame.to_numpy` instead.\n ``.values`` offers no way to control the output ``dtype``, copy\n semantics, or the value used to fill missing entries, while\n :meth:`DataFrame.to_numpy` exposes those as the ``dtype``,\n ``copy``, and ``na_value`` arguments. The mutability of the\n result also depends on the DataFrame's internal block layout:\n when the DataFrame is backed by a single block the result is a\n read-only view (writes raise); when there are multiple blocks\n the result is a writable copy whose mutations do not propagate\n back to the DataFrame.\n\n Only the values in the DataFrame will be returned, the axes labels\n will be removed.\n\n Returns\n -------\n numpy.ndarray\n The values of the DataFrame.\n\n See Also\n --------\n DataFrame.to_numpy : Recommended alternative to this method.\n DataFrame.index : Retrieve the index labels.\n DataFrame.columns : Retrieving the column names.\n\n Notes\n -----\n The returned array is not intended to be written to. When the\n DataFrame is backed by a single NumPy array (single dtype, single\n block), the result is a read-only view; when the DataFrame has\n multiple internal blocks (e.g. after adding a new column), the\n result is a copy and modifications to it will not be reflected in\n the original DataFrame. Use :meth:`DataFrame.to_numpy` for more\n explicit control over copy behavior, or use :attr:`DataFrame.iloc`\n to modify values in-place.\n\n The dtype will be a lower-common-denominator dtype (implicit\n upcasting); that is to say if the dtypes (even of numeric types)\n are mixed, the one that accommodates all will be chosen. Use this\n with care if you are not dealing with the blocks.\n\n e.g. If the dtypes are float16 and float32, dtype will be upcast to\n float32. If dtypes are int32 and uint8, dtype will be upcast to\n int32. By :func:`numpy.find_common_type` convention, mixing int64\n and uint64 will result in a float64 dtype.\n\n Examples\n --------\n A DataFrame where all columns are the same type (e.g., int64) results\n in an array of the same type.\n\n >>> df = pd.DataFrame(\n ... {\"age\": [3, 29], \"height\": [94, 170], \"weight\": [31, 115]}\n ... )\n >>> df\n age height weight\n 0 3 94 31\n 1 29 170 115\n >>> df.dtypes\n age int64\n height int64\n weight int64\n dtype: object\n >>> df.values\n array([[ 3, 94, 31],\n [ 29, 170, 115]])\n\n A DataFrame with mixed type columns(e.g., str/object, int64, float32)\n results in an ndarray of the broadest type that accommodates these\n mixed types (e.g., object).\n\n >>> df2 = pd.DataFrame(\n ... [\n ... (\"parrot\", 24.0, \"second\"),\n ... (\"lion\", 80.5, 1),\n ... (\"monkey\", np.nan, None),\n ... ],\n ... columns=(\"name\", \"max_speed\", \"rank\"),\n ... )\n >>> df2.dtypes\n name str\n max_speed float64\n rank object\n dtype: object\n >>> df2.values\n array([['parrot', 24.0, 'second'],\n ['lion', 80.5, 1],\n ['monkey', nan, None]], dtype=object)\n\n ``DataFrame.to_numpy`` produces the same array by default, but lets\n you choose how missing values are represented and request a\n guaranteed copy:\n\n >>> df3 = pd.DataFrame({\"a\": [1, 2], \"b\": [3.0, np.nan]})\n >>> df3.values\n array([[ 1., 3.],\n [ 2., nan]])\n >>> df3.to_numpy(na_value=-1)\n array([[ 1., 3.],\n [ 2., -1.]])\n >>> df3.to_numpy(dtype=\"float32\", copy=True)\n array([[ 1., 3.],\n [ 2., nan]], dtype=float32)\n \"\"\"\n return self._mgr.as_array()\n\n\ndef _from_nested_dict(\n data: Mapping[HashableT, Mapping[HashableT2, T]],\n) -> collections.defaultdict[HashableT2, dict[HashableT, T]]:\n new_data: collections.defaultdict[HashableT2, dict[HashableT, T]] = (\n collections.defaultdict(dict)\n )\n for index, s in data.items():\n for col, v in s.items():\n new_data[col][index] = v\n return new_data\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count\n```\n
","tags":["context-window","function-retrieval","python","front","256k"],"expected_answer":["def _constructor_from_mgr(self, mgr, axes) -> DataFrame:","df = DataFrame._from_mgr(mgr, axes=axes)","if type(self) is DataFrame:","return df"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":256000,"function_name":"_constructor_from_mgr","function_position":"front","evaluation_mode":"function_required_terms","expected_full_answer":" def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)"}} +{"id":"function-middle-256k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-middle-256k\nApproximate target context: 256000 tokens.\nReturn the complete source code of the Python function or method `_arith_method`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n \n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n```\n
","tags":["context-window","function-retrieval","python","middle","256k"],"expected_answer":["def _arith_method(self, other, op) -> DataFrame:","if self._should_reindex_frame_op(other, op, 1, None, None):","return self._arith_method_with_reindex(other, op)","axis: Literal[1] = 1 # only relevant for Series other case"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":256000,"function_name":"_arith_method","function_position":"middle","evaluation_mode":"function_required_terms","expected_full_answer":" def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)"}} +{"id":"function-late-256k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-late-256k\nApproximate target context: 256000 tokens.\nReturn the complete source code of the Python function or method `_reindex_for_setitem`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n # test_append_empty_frame_to_series_with_dateutil_tz\n row_df = row_df.infer_objects().rename_axis(index.names)\n\n if len(row_df.columns) == len(self.columns):\n # Pre-cast the row's value to the original column dtype where the\n # row's inferred dtype would otherwise force concat to widen the\n # whole column. This avoids an O(N) materialize-and-rebuild\n # roundtrip in _post_expansion_casting, and (for EA dtypes that\n # carry array-level state not encoded in the dtype, e.g. geopandas\n # CRS) preserves that state through concat. GH#65094.\n orig_dtypes = self._mgr.get_dtypes()\n row_dtypes = row_df._mgr.get_dtypes()\n object_dtype = np.dtype(object)\n for i in range(len(self.columns)):\n orig_dtype = orig_dtypes[i]\n if row_dtypes[i] == orig_dtype:\n continue\n if orig_dtype == object_dtype:\n # concat object + anything stays object; post-cast is a\n # no-op, so pre-casting would only add overhead.\n continue\n arr = self._get_column_array(i)\n if isinstance(arr, np.ndarray):\n # infer_and_maybe_downcast expects an EA as its first\n # argument so it can dispatch to _cast_pointwise_result.\n arr = NumpyExtensionArray(arr)\n casted = infer_and_maybe_downcast(arr, row_df._mgr.iget_values(i))\n row_df.isetitem(i, casted)\n\n from pandas.core.reshape.concat import concat\n\n result = concat(\n [self, row_df],\n ignore_index=ignore_index,\n )\n return result.__finalize__(self, method=\"append\")\n\n def join(\n self,\n other: DataFrame | Series | Iterable[DataFrame | Series],\n on: IndexLabel | None = None,\n how: MergeHow = \"left\",\n lsuffix: str = \"\",\n rsuffix: str = \"\",\n sort: bool = False,\n validate: JoinValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Join columns of another DataFrame.\n\n Join columns with `other` DataFrame either on index or on a key\n column. Efficiently join multiple DataFrame objects by index at once by\n passing a list.\n\n Parameters\n ----------\n other : DataFrame, Series, or a list containing any combination of them\n Index should be similar to one of the columns in the caller. If a\n Series is passed, its name attribute must be set, and that will be\n used as the column name in the resulting joined DataFrame.\n on : str, list of str, or array-like, optional\n Column or index level name(s) in the caller to join on the index\n in `other`, otherwise joins index-on-index. If multiple\n values given, the `other` DataFrame must have a MultiIndex. Can\n pass an array as the join key if it is not already contained in\n the calling DataFrame. Like an Excel VLOOKUP operation.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'left'\n How to handle the operation of the two objects.\n\n * left: use calling frame's index (or column if on is specified)\n * right: use `other`'s index.\n * outer: form union of calling frame's index (or column if on is\n specified) with `other`'s index, and sort it lexicographically.\n * inner: form intersection of calling frame's index (or column if\n on is specified) with `other`'s index, preserving the order\n of the calling's one.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use set difference of calling frame's index and `other`'s\n index.\n * right_anti: use set difference of `other`'s index and calling frame's\n index.\n lsuffix : str, default ''\n Suffix to use from left frame's overlapping columns.\n rsuffix : str, default ''\n Suffix to use from right frame's overlapping columns.\n sort : bool, default False\n Order result DataFrame lexicographically by the join key. If False,\n the order of the join key depends on the join type (how keyword).\n validate : str, optional\n If specified, checks if join is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if join keys are unique in both left\n and right datasets.\n * \"one_to_many\" or \"1:m\": check if join keys are unique in left dataset.\n * \"many_to_one\" or \"m:1\": check if join keys are unique in right dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A dataframe containing columns from both the caller and `other`.\n\n See Also\n --------\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n Parameters `on`, `lsuffix`, and `rsuffix` are not supported when\n passing a list of `DataFrame` objects.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K2\", \"K3\", \"K4\", \"K5\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K2 A2\n 3 K3 A3\n 4 K4 A4\n 5 K5 A5\n\n >>> other = pd.DataFrame({\"key\": [\"K0\", \"K1\", \"K2\"], \"B\": [\"B0\", \"B1\", \"B2\"]})\n\n >>> other\n key B\n 0 K0 B0\n 1 K1 B1\n 2 K2 B2\n\n Join DataFrames using their indexes.\n\n >>> df.join(other, lsuffix=\"_caller\", rsuffix=\"_other\")\n key_caller A key_other B\n 0 K0 A0 K0 B0\n 1 K1 A1 K1 B1\n 2 K2 A2 K2 B2\n 3 K3 A3 NaN NaN\n 4 K4 A4 NaN NaN\n 5 K5 A5 NaN NaN\n\n If we want to join using the key columns, we need to set key to be\n the index in both `df` and `other`. The joined DataFrame will have\n key as its index.\n\n >>> df.set_index(\"key\").join(other.set_index(\"key\"))\n A B\n key\n K0 A0 B0\n K1 A1 B1\n K2 A2 B2\n K3 A3 NaN\n K4 A4 NaN\n K5 A5 NaN\n\n Another option to join using the key columns is to use the `on`\n parameter. DataFrame.join always uses `other`'s index but we can use\n any column in `df`. This method preserves the original DataFrame's\n index in the result.\n\n >>> df.join(other.set_index(\"key\"), on=\"key\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K2 A2 B2\n 3 K3 A3 NaN\n 4 K4 A4 NaN\n 5 K5 A5 NaN\n\n Using non-unique key values shows how they are matched.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K1\", \"K3\", \"K0\", \"K1\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K1 A2\n 3 K3 A3\n 4 K0 A4\n 5 K1 A5\n\n >>> df.join(other.set_index(\"key\"), on=\"key\", validate=\"m:1\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K1 A2 B1\n 3 K3 A3 NaN\n 4 K0 A4 B0\n 5 K1 A5 B1\n \"\"\"\n from pandas.core.reshape.concat import concat\n from pandas.core.reshape.merge import merge\n\n if isinstance(other, Series):\n if other.name is None:\n raise ValueError(\"Other Series must have a name\")\n other = DataFrame({other.name: other})\n\n if isinstance(other, DataFrame):\n if how == \"cross\":\n return merge(\n self,\n other,\n how=how,\n on=on,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n return merge(\n self,\n other,\n left_on=on,\n how=how,\n left_index=on is None,\n right_index=True,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n else:\n if on is not None:\n raise ValueError(\n \"Joining multiple DataFrames only supported for joining on index\"\n )\n\n if rsuffix or lsuffix:\n raise ValueError(\n \"Suffixes not supported when joining multiple DataFrames\"\n )\n\n # Mypy thinks the RHS is a\n # \"Union[DataFrame, Series, Iterable[Union[DataFrame, Series]]]\" whereas\n # the LHS is an \"Iterable[DataFrame]\", but in reality both types are\n # \"Iterable[Union[DataFrame, Series]]\" due to the if statements\n frames = [cast(\"DataFrame | Series\", self), *list(other)]\n\n can_concat = all(df.index.is_unique for df in frames)\n\n # join indexes only using concat\n if can_concat:\n if how in {\"left\", \"right\"}:\n res = concat(\n frames, axis=1, join=\"outer\", verify_integrity=True, sort=sort\n )\n index = self.index if how == \"left\" else frames[-1].index\n if sort:\n index = index.sort_values()\n result = res.reindex(index)\n return result\n else:\n if how == \"outer\":\n sort = True\n return concat(\n frames, axis=1, join=how, verify_integrity=True, sort=sort\n )\n\n joined = frames[0]\n\n for frame in frames[1:]:\n joined = merge(\n joined,\n frame,\n sort=sort,\n how=how,\n left_index=True,\n right_index=True,\n validate=validate,\n )\n\n return joined\n\n def merge(\n self,\n right: DataFrame | Series,\n how: MergeHow = \"inner\",\n on: IndexLabel | AnyArrayLike | None = None,\n left_on: IndexLabel | AnyArrayLike | None = None,\n right_on: IndexLabel | AnyArrayLike | None = None,\n left_index: bool = False,\n right_index: bool = False,\n sort: bool = False,\n suffixes: Suffixes = (\"_x\", \"_y\"),\n copy: bool | lib.NoDefault = lib.no_default,\n indicator: str | bool = False,\n validate: MergeValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Merge DataFrame or named Series objects with a database-style join.\n\n A named Series object is treated as a DataFrame with a single named column.\n\n The join is done on columns or indexes. If joining columns on\n columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes\n on indexes or indexes on a column or columns, the index will be passed on.\n When performing a cross merge, no column specifications to merge on are\n allowed.\n\n .. warning::\n\n If both key columns contain rows where the key is a null value, those\n rows will be matched against each other. This is different from usual SQL\n join behaviour and can lead to unexpected results.\n\n Parameters\n ----------\n right : DataFrame or named Series\n Object to merge with.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'inner'\n Type of merge to be performed.\n\n * left: use only keys from left frame, similar to a SQL left outer join;\n preserve key order.\n * right: use only keys from right frame, similar to a SQL right outer join;\n preserve key order.\n * outer: use union of keys from both frames, similar to a SQL full outer\n join; sort keys lexicographically.\n * inner: use intersection of keys from both frames, similar to a SQL inner\n join; preserve the order of the left keys.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use only keys from left frame that are not in right frame,\n similar to SQL left anti join; preserve key order.\n\n .. versionadded:: 3.0\n * right_anti: use only keys from right frame that are not in left frame,\n similar to SQL right anti join; preserve key order.\n\n .. versionadded:: 3.0\n on : Hashable or a sequence of the previous\n Column or index level names to join on. These must be found in both\n DataFrames. If `on` is None and not merging on indexes then this defaults\n to the intersection of the columns in both DataFrames.\n left_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the left DataFrame. Can also\n be an array or list of arrays of the length of the left DataFrame.\n These arrays are treated as if they are columns.\n right_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the right DataFrame. Can also\n be an array or list of arrays of the length of the right DataFrame.\n These arrays are treated as if they are columns.\n left_index : bool, default False\n Use the index from the left DataFrame as the join key(s). If it is a\n MultiIndex, the number of keys in the other DataFrame (either the index\n or a number of columns) must match the number of levels.\n right_index : bool, default False\n Use the index from the right DataFrame as the join key. Same caveats as\n left_index.\n sort : bool, default False\n Sort the join keys lexicographically in the result DataFrame. If False,\n the order of the join keys depends on the join type (how keyword).\n suffixes : list-like, default is (\"_x\", \"_y\")\n A length-2 sequence where each element is optionally a string\n indicating the suffix to add to overlapping column names in\n `left` and `right` respectively. Pass a value of `None` instead\n of a string to indicate that the column name from `left` or\n `right` should be left as-is, with no suffix. At least one of the\n values must not be None.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n indicator : bool or str, default False\n If True, adds a column to the output DataFrame called \"_merge\" with\n information on the source of each row. The column can be given a different\n name by providing a string argument. The column will have a Categorical\n type with the value of \"left_only\" for observations whose merge key only\n appears in the left DataFrame, \"right_only\" for observations\n whose merge key only appears in the right DataFrame, and \"both\"\n if the observation's merge key is found in both DataFrames.\n\n validate : str, optional\n If specified, checks if merge is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if merge keys are unique in both\n left and right datasets.\n * \"one_to_many\" or \"1:m\": check if merge keys are unique in left\n dataset.\n * \"many_to_one\" or \"m:1\": check if merge keys are unique in right\n dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A DataFrame of the two merged objects.\n\n See Also\n --------\n merge_ordered : Merge with optional filling/interpolation.\n merge_asof : Merge on nearest keys.\n DataFrame.join : Similar method using indices.\n\n Examples\n --------\n >>> df1 = pd.DataFrame(\n ... {\"lkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [1, 2, 3, 5]}\n ... )\n >>> df2 = pd.DataFrame(\n ... {\"rkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [5, 6, 7, 8]}\n ... )\n >>> df1\n lkey value\n 0 foo 1\n 1 bar 2\n 2 baz 3\n 3 foo 5\n >>> df2\n rkey value\n 0 foo 5\n 1 bar 6\n 2 baz 7\n 3 foo 8\n\n Merge df1 and df2 on the lkey and rkey columns. The value columns have\n the default suffixes, _x and _y, appended.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\")\n lkey value_x rkey value_y\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2 with specified left and right suffixes\n appended to any overlapping columns.\n\n >>> df1.merge(\n ... df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(\"_left\", \"_right\")\n ... )\n lkey value_left rkey value_right\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2, but raise an exception if the DataFrames have\n any overlapping columns.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(False, False))\n Traceback (most recent call last):\n ...\n ValueError: columns overlap but no suffix specified:\n Index(['value'], dtype='object')\n\n >>> df1 = pd.DataFrame({\"a\": [\"foo\", \"bar\"], \"b\": [1, 2]})\n >>> df2 = pd.DataFrame({\"a\": [\"foo\", \"baz\"], \"c\": [3, 4]})\n >>> df1\n a b\n 0 foo 1\n 1 bar 2\n >>> df2\n a c\n 0 foo 3\n 1 baz 4\n\n >>> df1.merge(df2, how=\"inner\", on=\"a\")\n a b c\n 0 foo 1 3\n\n >>> df1.merge(df2, how=\"left\", on=\"a\")\n a b c\n 0 foo 1 3.0\n 1 bar 2 NaN\n\n >>> df1 = pd.DataFrame({\"left\": [\"foo\", \"bar\"]})\n >>> df2 = pd.DataFrame({\"right\": [7, 8]})\n >>> df1\n left\n 0 foo\n 1 bar\n >>> df2\n right\n 0 7\n 1 8\n\n >>> df1.merge(df2, how=\"cross\")\n left right\n 0 foo 7\n 1 foo 8\n 2 bar 7\n 3 bar 8\n \"\"\"\n self._check_copy_deprecation(copy)\n\n from pandas.core.reshape.merge import merge\n\n return merge(\n self,\n right,\n how=how,\n on=on,\n left_on=left_on,\n right_on=right_on,\n left_index=left_index,\n right_index=right_index,\n sort=sort,\n suffixes=suffixes,\n indicator=indicator,\n validate=validate,\n )\n\n def round(\n self, decimals: int | dict[IndexLabel, int] | Series = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Round numeric columns in a DataFrame to a variable number of decimal places.\n\n Each column can be rounded to a different number of decimal places by\n passing a dict or Series mapping column names to the desired precision.\n Non-numeric columns are left unchanged.\n\n Parameters\n ----------\n decimals : int, dict, Series\n Number of decimal places to round each column to. If an int is\n given, round each column to the same number of places.\n Otherwise dict and Series round to variable numbers of places.\n Column names should be in the keys if `decimals` is a\n dict-like, or in the index if `decimals` is a Series. Any\n columns not included in `decimals` will be left as is. Elements\n of `decimals` which are not columns of the input will be\n ignored.\n *args\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n\n Returns\n -------\n DataFrame\n A DataFrame with the affected columns rounded to the specified\n number of decimal places.\n\n See Also\n --------\n numpy.around : Round a numpy array to the given number of decimals.\n Series.round : Round a Series to the given number of decimals.\n\n Notes\n -----\n For values exactly halfway between rounded decimal values, pandas rounds\n to the nearest even value (e.g. -0.5 and 0.5 round to 0.0, 1.5 and 2.5\n round to 2.0, etc.).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(0.21, 0.32), (0.01, 0.67), (0.66, 0.03), (0.21, 0.18)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df\n dogs cats\n 0 0.21 0.32\n 1 0.01 0.67\n 2 0.66 0.03\n 3 0.21 0.18\n\n By providing an integer each column is rounded to the same number\n of decimal places\n\n >>> df.round(1)\n dogs cats\n 0 0.2 0.3\n 1 0.0 0.7\n 2 0.7 0.0\n 3 0.2 0.2\n\n With a dict, the number of places for specific columns can be\n specified with the column names as key and the number of decimal\n places as value\n\n >>> df.round({\"dogs\": 1, \"cats\": 0})\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n\n Using a Series, the number of places for specific columns can be\n specified with the column names as index and the number of\n decimal places as value\n\n >>> decimals = pd.Series([0, 1], index=[\"cats\", \"dogs\"])\n >>> df.round(decimals)\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n \"\"\"\n from pandas.core.reshape.concat import concat\n\n def _dict_round(df: DataFrame, decimals) -> Iterator[Series]:\n for col, vals in df.items():\n try:\n yield _series_round(vals, decimals[col])\n except KeyError:\n yield vals\n\n def _series_round(ser: Series, decimals: int) -> Series:\n if is_integer_dtype(ser.dtype) or is_float_dtype(ser.dtype):\n return ser.round(decimals)\n elif isinstance(ser._values, (DatetimeArray, TimedeltaArray, PeriodArray)):\n # GH#57781\n # TODO: also the ArrowDtype analogues?\n warnings.warn(\n \"obj.round has no effect with datetime, timedelta, \"\n \"or period dtypes. Use obj.dt.round(...) instead.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n return ser\n\n nv.validate_round(args, kwargs)\n\n if isinstance(decimals, (dict, Series)):\n if isinstance(decimals, Series) and not decimals.index.is_unique:\n raise ValueError(\"Index of decimals must be unique\")\n if is_dict_like(decimals) and not all(\n is_integer(value) for _, value in decimals.items()\n ):\n raise TypeError(\"Values in decimals must be integers\")\n new_cols = list(_dict_round(self, decimals))\n elif is_integer(decimals):\n # Dispatch to Block.round\n # Argument \"decimals\" to \"round\" of \"BaseBlockManager\" has incompatible\n # type \"Union[int, integer[Any]]\"; expected \"int\"\n new_mgr = self._mgr.round(\n decimals=decimals, # type: ignore[arg-type]\n )\n return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(\n self, method=\"round\"\n )\n else:\n raise TypeError(\"decimals must be an integer, a dict-like or a Series\")\n\n if new_cols is not None and len(new_cols) > 0:\n return self._constructor(\n concat(new_cols, axis=1), index=self.index, columns=self.columns\n ).__finalize__(self, method=\"round\")\n else:\n return self.copy(deep=False)\n\n # ----------------------------------------------------------------------\n # Statistical methods, etc.\n\n def describe(\n self,\n percentiles=None,\n include=None,\n exclude=None,\n ) -> DataFrame:\n \"\"\"\n Generate descriptive statistics.\n\n Summarize the central tendency, dispersion, and shape of each\n analyzed column's distribution, excluding ``NaN`` values. By\n default only numeric columns are analyzed; pass ``include`` to\n also analyze non-numeric columns (or ``exclude`` to omit columns\n by dtype).\n\n Parameters\n ----------\n percentiles : list-like of numbers, optional\n The percentiles to include in the output. All should fall\n between 0 and 1. The default, ``None``, returns the 25th,\n 50th, and 75th percentiles.\n include : 'all', list-like of dtypes or None (default), optional\n Which column dtypes to include. Options:\n\n - ``'all'`` : Include all columns, including non-numeric ones.\n - list-like of dtypes : Limit the result to columns of the\n given dtypes, in the style of\n :meth:`DataFrame.select_dtypes` (e.g. ``include=[np.number]``\n or ``include=[\"category\"]``).\n - ``None`` (default) : Include only numeric columns, falling\n back to object and categorical columns if there are no\n numeric columns.\n exclude : list-like of dtypes or None (default), optional\n Column dtypes to omit from the result, in the style of\n :meth:`DataFrame.select_dtypes`. ``None`` (default) excludes\n nothing.\n\n Returns\n -------\n DataFrame\n Summary statistics of the DataFrame's columns.\n\n See Also\n --------\n Series.describe : Generate descriptive statistics of a Series.\n DataFrame.count : Count of non-NA observations per column.\n DataFrame.max : Maximum of the values in each column.\n DataFrame.min : Minimum of the values in each column.\n DataFrame.mean : Mean of the values.\n DataFrame.std : Standard deviation of the observations.\n DataFrame.select_dtypes : Subset of a DataFrame including/excluding\n columns based on their dtype.\n\n Notes\n -----\n For numeric columns, the result's index includes ``count``,\n ``mean``, ``std``, ``min``, ``max``, and the requested\n percentiles. By default the lower percentile is ``25`` and the\n upper is ``75``; the ``50`` percentile is the same as the median.\n\n For object columns, the result's index includes ``count``,\n ``unique``, ``top``, and ``freq``. The ``top`` is the most common\n value and ``freq`` is its count. If multiple values tie for the\n highest count, ``top`` is chosen arbitrarily from among them.\n\n With ``include='all'``, the result's index is the union of the\n per-dtype indices, with ``NaN`` for statistics that do not apply\n to a given column's dtype.\n\n Examples\n --------\n By default, only numeric columns are analyzed.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"categorical\": pd.Categorical([\"d\", \"e\", \"f\"]),\n ... \"numeric\": [1, 2, 3],\n ... \"object\": [\"a\", \"b\", \"c\"],\n ... }\n ... )\n >>> df.describe()\n numeric\n count 3.0\n mean 2.0\n std 1.0\n min 1.0\n 25% 1.5\n 50% 2.0\n 75% 2.5\n max 3.0\n\n All columns regardless of dtype.\n\n >>> df.describe(include=\"all\") # doctest: +SKIP\n categorical numeric object\n count 3 3.0 3\n unique 3 NaN 3\n top f NaN a\n freq 1 NaN 1\n mean NaN 2.0 NaN\n std NaN 1.0 NaN\n min NaN 1.0 NaN\n 25% NaN 1.5 NaN\n 50% NaN 2.0 NaN\n 75% NaN 2.5 NaN\n max NaN 3.0 NaN\n\n Restrict the result to a specific dtype.\n\n >>> df.describe(include=[\"category\"])\n categorical\n count 3\n unique 3\n top d\n freq 1\n\n Exclude a specific dtype.\n\n >>> df.describe(exclude=[np.number]) # doctest: +SKIP\n categorical object\n count 3 3\n unique 3 3\n top f a\n freq 1 1\n \"\"\"\n return super().describe(\n percentiles=percentiles, include=include, exclude=exclude\n )\n\n def corr(\n self,\n method: CorrelationMethod = \"pearson\",\n min_periods: int = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise correlation of columns, excluding NA/null values.\n\n The result is a symmetric DataFrame where each element represents\n the correlation coefficient between two columns. By default, the\n Pearson correlation is computed, but Kendall and Spearman methods\n as well as arbitrary callables are also supported.\n\n Parameters\n ----------\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float. Note that the returned matrix from corr\n will have 1 along the diagonals and will be symmetric\n regardless of the callable's behavior.\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n Correlation matrix.\n\n See Also\n --------\n DataFrame.corrwith : Compute pairwise correlation with another\n DataFrame or Series.\n Series.corr : Compute the correlation between two Series.\n\n Notes\n -----\n Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.\n\n * `Pearson correlation coefficient `_\n * `Kendall rank correlation coefficient `_\n * `Spearman's rank correlation coefficient `_\n\n Examples\n --------\n >>> def histogram_intersection(a, b):\n ... v = np.minimum(a, b).sum().round(decimals=1)\n ... return v\n >>> df = pd.DataFrame(\n ... [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df.corr(method=histogram_intersection)\n dogs cats\n dogs 1.0 0.3\n cats 0.3 1.0\n\n >>> df = pd.DataFrame(\n ... [(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.corr(min_periods=3)\n dogs cats\n dogs 1.0 NaN\n cats NaN 1.0\n \"\"\" # noqa: E501\n data = self._get_numeric_data() if numeric_only else self\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if method == \"pearson\":\n correl = libalgos.nancorr(mat, minp=min_periods)\n elif method == \"spearman\":\n correl = libalgos.nancorr_spearman(mat, minp=min_periods)\n elif method == \"kendall\" or callable(method):\n if min_periods is None:\n min_periods = 1\n mat = mat.T\n corrf = nanops.get_corr_func(method)\n K = len(cols)\n correl = np.empty((K, K), dtype=float)\n mask = np.isfinite(mat)\n for i, ac in enumerate(mat):\n for j, bc in enumerate(mat):\n if i > j:\n continue\n\n valid = mask[i] & mask[j]\n if valid.sum() < min_periods:\n c = np.nan\n elif i == j:\n c = 1.0\n elif not valid.all():\n c = corrf(ac[valid], bc[valid])\n else:\n c = corrf(ac, bc)\n correl[i, j] = c\n correl[j, i] = c\n else:\n raise ValueError(\n \"method must be either 'pearson', \"\n \"'spearman', 'kendall', or a callable, \"\n f\"'{method}' was supplied\"\n )\n\n result = self._constructor(correl, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"corr\")\n\n def cov(\n self,\n min_periods: int | None = None,\n ddof: int | None = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise covariance of columns, excluding NA/null values.\n\n Compute the pairwise covariance among the series of a DataFrame.\n The returned data frame is the `covariance matrix\n `__ of the columns\n of the DataFrame.\n\n Both NA and null values are automatically excluded from the\n calculation. (See the note below about bias from missing values.)\n A threshold can be set for the minimum number of\n observations for each value created. Comparisons with observations\n below this threshold will be returned as ``NaN``.\n\n This method is generally used for the analysis of time series data to\n understand the relationship between different measures\n across time.\n\n Parameters\n ----------\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result.\n\n ddof : int, default 1\n Delta degrees of freedom. The divisor used in calculations\n is ``N - ddof``, where ``N`` represents the number of elements.\n This argument is applicable only when no ``nan`` is in the dataframe.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n The covariance matrix of the series of the DataFrame.\n\n See Also\n --------\n Series.cov : Compute covariance with another Series.\n core.window.ewm.ExponentialMovingWindow.cov : Exponential weighted sample\n covariance.\n core.window.expanding.Expanding.cov : Expanding sample covariance.\n core.window.rolling.Rolling.cov : Rolling sample covariance.\n\n Notes\n -----\n Returns the covariance matrix of the DataFrame's time series.\n The covariance is normalized by N-ddof.\n\n For DataFrames that have Series that are missing data (assuming that\n data is `missing at random\n `__)\n the returned covariance matrix will be an unbiased estimate\n of the variance and covariance between the member Series.\n\n However, for many applications this estimate may not be acceptable\n because the estimate covariance matrix is not guaranteed to be positive\n semi-definite. This could lead to estimate correlations having\n absolute values which are greater than one, and/or a non-invertible\n covariance matrix. See `Estimation of covariance matrices\n `__ for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(1, 2), (0, 3), (2, 0), (1, 1)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.cov()\n dogs cats\n dogs 0.666667 -1.000000\n cats -1.000000 1.666667\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(\n ... np.random.randn(1000, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"]\n ... )\n >>> df.cov()\n a b c d e\n a 0.998438 -0.020161 0.059277 -0.008943 0.014144\n b -0.020161 1.059352 -0.008543 -0.024738 0.009826\n c 0.059277 -0.008543 1.010670 -0.001486 -0.000271\n d -0.008943 -0.024738 -0.001486 0.921297 -0.013692\n e 0.014144 0.009826 -0.000271 -0.013692 0.977795\n\n **Minimum number of periods**\n\n This method also supports an optional ``min_periods`` keyword\n that specifies the required minimum number of non-NA observations for\n each column pair in order to have a valid result:\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(np.random.randn(20, 3), columns=[\"a\", \"b\", \"c\"])\n >>> df.loc[df.index[:5], \"a\"] = np.nan\n >>> df.loc[df.index[5:10], \"b\"] = np.nan\n >>> df.cov(min_periods=12)\n a b c\n a 0.316741 NaN -0.150812\n b NaN 1.248003 0.191417\n c -0.150812 0.191417 0.895202\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n if any(blk.dtype.kind in \"mM\" for blk in self._mgr.blocks):\n msg = (\n \"DataFrame contains columns with dtype datetime64 \"\n \"or timedelta64, which are not supported for cov.\"\n )\n raise TypeError(msg)\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if notna(mat).all():\n if min_periods is not None and min_periods > len(mat):\n base_cov = np.empty((mat.shape[1], mat.shape[1]))\n base_cov.fill(np.nan)\n else:\n base_cov = np.cov(mat.T, ddof=ddof)\n base_cov = base_cov.reshape((len(cols), len(cols)))\n else:\n base_cov = libalgos.nancorr(mat, cov=True, minp=min_periods)\n\n result = self._constructor(base_cov, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"cov\")\n\n def corrwith(\n self,\n other: DataFrame | Series,\n axis: Axis = 0,\n drop: bool = False,\n method: CorrelationMethod = \"pearson\",\n numeric_only: bool = False,\n min_periods: int | None = None,\n ) -> Series:\n \"\"\"\n Compute pairwise correlation.\n\n Pairwise correlation is computed between rows or columns of\n DataFrame with rows or columns of Series or DataFrame. DataFrames\n are first aligned along both axes before computing the\n correlations.\n\n Parameters\n ----------\n other : DataFrame, Series\n Object with which to compute correlations.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' to compute row-wise, 1 or 'columns' for\n column-wise.\n drop : bool, default False\n Drop missing indices from result.\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n min_periods : int, optional\n Minimum number of observations needed to have a valid result.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n Series\n Pairwise correlations.\n\n See Also\n --------\n DataFrame.corr : Compute pairwise correlation of columns.\n\n Examples\n --------\n >>> index = [\"a\", \"b\", \"c\", \"d\", \"e\"]\n >>> columns = [\"one\", \"two\", \"three\", \"four\"]\n >>> df1 = pd.DataFrame(\n ... np.arange(20).reshape(5, 4), index=index, columns=columns\n ... )\n >>> df2 = pd.DataFrame(\n ... np.arange(16).reshape(4, 4), index=index[:4], columns=columns\n ... )\n >>> df1.corrwith(df2)\n one 1.0\n two 1.0\n three 1.0\n four 1.0\n dtype: float64\n\n >>> df2.corrwith(df1, axis=1)\n a 1.0\n b 1.0\n c 1.0\n d 1.0\n e NaN\n dtype: float64\n \"\"\"\n axis = self._get_axis_number(axis)\n this = self._get_numeric_data() if numeric_only else self\n\n if isinstance(other, Series):\n return this.apply(\n lambda x: other.corr(x, method=method, min_periods=min_periods),\n axis=axis,\n )\n\n if numeric_only:\n other = other._get_numeric_data()\n left, right = this.align(other, join=\"inner\")\n\n if axis == 1:\n left = left.T\n right = right.T\n\n if method == \"pearson\":\n # mask missing values\n left = left + right * 0\n right = right + left * 0\n\n # demeaned data\n ldem = left - left.mean(numeric_only=numeric_only)\n rdem = right - right.mean(numeric_only=numeric_only)\n\n num = (ldem * rdem).sum()\n dom = (\n (left.count() - 1)\n * left.std(numeric_only=numeric_only)\n * right.std(numeric_only=numeric_only)\n )\n\n correl = num / dom\n\n elif method in [\"kendall\", \"spearman\"] or callable(method):\n\n def c(x):\n return nanops.nancorr(x[0], x[1], method=method)\n\n correl = self._constructor_sliced(\n map(c, zip(left.values.T, right.values.T, strict=True)),\n index=left.columns,\n copy=False,\n )\n\n else:\n raise ValueError(\n f\"Invalid method {method} was passed, \"\n \"valid methods are: 'pearson', 'kendall', \"\n \"'spearman', or callable\"\n )\n\n if not drop:\n # Find non-matching labels along the given axis\n # and append missing correlations (GH 22375)\n raxis: AxisInt = 1 if axis == 0 else 0\n result_index = this._get_axis(raxis).union(other._get_axis(raxis))\n idx_diff = result_index.difference(correl.index)\n\n if len(idx_diff) > 0:\n correl = correl._append_internal(\n Series([np.nan] * len(idx_diff), index=idx_diff)\n )\n\n return correl\n\n # ----------------------------------------------------------------------\n # ndarray-like stats methods\n\n def count(self, axis: Axis = 0, numeric_only: bool = False) -> Series:\n \"\"\"\n Count non-NA cells for each column or row.\n\n The values `None`, `NaN`, `NaT`, ``pandas.NA`` are considered NA.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index' counts are generated for each column.\n If 1 or 'columns' counts are generated for each row.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n For each column/row the number of non-NA/null entries.\n\n See Also\n --------\n Series.count: Number of non-NA elements in a Series.\n DataFrame.value_counts: Count unique combinations of columns.\n DataFrame.shape: Number of DataFrame rows and columns (including NA\n elements).\n DataFrame.isna: Boolean same-sized DataFrame showing places of NA\n elements.\n\n Examples\n --------\n Constructing DataFrame from a dictionary:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Person\": [\"John\", \"Myla\", \"Lewis\", \"John\", \"Myla\"],\n ... \"Age\": [24.0, np.nan, 21.0, 33, 26],\n ... \"Single\": [False, True, True, True, False],\n ... }\n ... )\n >>> df\n Person Age Single\n 0 John 24.0 False\n 1 Myla NaN True\n 2 Lewis 21.0 True\n 3 John 33.0 True\n 4 Myla 26.0 False\n\n Notice the uncounted NA values:\n\n >>> df.count()\n Person 5\n Age 4\n Single 5\n dtype: int64\n\n Counts for each **row**:\n\n >>> df.count(axis=\"columns\")\n 0 3\n 1 2\n 2 3\n 3 3\n 4 3\n dtype: int64\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if numeric_only:\n frame = self._get_numeric_data()\n else:\n frame = self\n\n # GH #423\n if len(frame._get_axis(axis)) == 0:\n result = self._constructor_sliced(0, index=frame._get_agg_axis(axis))\n else:\n result = notna(frame).sum(axis=axis)\n\n return result.astype(\"int64\").__finalize__(self, method=\"count\")\n\n def _reduce(\n self,\n op,\n name: str,\n *,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n filter_type=None,\n **kwds,\n ):\n assert filter_type is None or filter_type == \"bool\", filter_type\n out_dtype = \"bool\" if filter_type == \"bool\" else None\n\n if axis is not None:\n axis = self._get_axis_number(axis)\n\n def func(values: np.ndarray):\n # We only use this in the case that operates on self.values\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def blk_func(values, axis: Axis = 1):\n if isinstance(values, ExtensionArray):\n if not is_1d_only_ea_dtype(values.dtype):\n return values._reduce(name, axis=1, skipna=skipna, **kwds)\n return values._reduce(name, skipna=skipna, keepdims=True, **kwds)\n else:\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def _get_data() -> DataFrame:\n if filter_type is None:\n data = self._get_numeric_data()\n else:\n # GH#25101, GH#24434\n assert filter_type == \"bool\"\n data = self._get_bool_data()\n return data\n\n # Case with EAs see GH#35881\n df = self\n if numeric_only:\n df = _get_data()\n if axis is None:\n dtype = find_common_type([block.values.dtype for block in df._mgr.blocks])\n if isinstance(dtype, ExtensionDtype):\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n return arr._reduce(name, skipna=skipna, keepdims=False, **kwds)\n return maybe_unbox_numpy_scalar(func(df.values))\n elif axis == 1:\n if len(df.index) == 0:\n # Taking a transpose would result in no columns, losing the dtype.\n # In the empty case, reducing along axis 0 or 1 gives the same\n # result dtype, so reduce with axis=0 and ignore values\n result = df._reduce(\n op,\n name,\n axis=0,\n skipna=skipna,\n numeric_only=False,\n filter_type=filter_type,\n **kwds,\n ).iloc[:0]\n result.index = df.index\n return result\n\n if df.shape[1]:\n # GH#51474: block-wise axis=1 reduction avoiding expensive\n # transpose for numpy-backed and 2D EA blocks.\n if (\n name in (\"sum\", \"prod\", \"min\", \"max\", \"any\", \"all\", \"mean\")\n and len(df._mgr.blocks) > 1\n and all(\n (isinstance(bv, np.ndarray) and bv.dtype.kind != \"O\")\n or (\n isinstance(bv, ExtensionArray)\n and bv.ndim == 2\n and name in (\"min\", \"max\")\n and skipna\n )\n for bv in (block.values for block in df._mgr.blocks)\n )\n ):\n return df._reduce_axis1(\n name,\n op,\n skipna=skipna,\n min_count=kwds.get(\"min_count\", 0),\n )\n dtype = find_common_type(\n [block.values.dtype for block in df._mgr.blocks]\n )\n if isinstance(dtype, ExtensionDtype):\n # GH 54341: fastpath for EA-backed axis=1 reductions\n # This flattens the frame into a single 1D array while keeping\n # track of the row and column indices of the original frame. Once\n # flattened, grouping by the row indices and aggregating should\n # be equivalent to transposing the original frame and aggregating\n # with axis=0.\n name = {\"argmax\": \"idxmax\", \"argmin\": \"idxmin\"}.get(name, name)\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n nrows, ncols = df.shape\n row_index = np.tile(np.arange(nrows), ncols)\n col_index = np.repeat(np.arange(ncols), nrows)\n ser = Series(arr, index=col_index, copy=False)\n if name == \"all\":\n # Behavior here appears incorrect; preserving\n # for backwards compatibility for now.\n # See https://github.com/pandas-dev/pandas/issues/57171\n skipna = True\n result = ser.groupby(row_index).agg(name, **kwds, skipna=skipna)\n result.index = df.index\n return result\n\n df = df.T\n\n # After possibly _get_data and transposing, we are now in the\n # simple case where we can use BlockManager.reduce\n res = df._mgr.reduce(blk_func)\n out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]\n out.name = None\n if out_dtype is not None and out.dtype != \"boolean\":\n out = out.astype(out_dtype)\n elif (df._mgr.get_dtypes() == object).any() and name not in [\"any\", \"all\"]:\n out = out.astype(object)\n\n return out\n\n def _reduce_axis1(\n self, name: str, func, skipna: bool, min_count: int = 0\n ) -> Series:\n \"\"\"\n Special case for _reduce to try to avoid a potentially-expensive transpose.\n\n Apply the reduction block-wise along axis=1 and then reduce the resulting\n 1D arrays.\n \"\"\"\n if name == \"all\":\n result = np.ones(len(self), dtype=bool)\n ufunc = np.logical_and\n elif name == \"any\":\n result = np.zeros(len(self), dtype=bool)\n # error: Incompatible types in assignment\n # (expression has type \"_UFunc_Nin2_Nout1[Literal['logical_or'],\n # Literal[20], Literal[False]]\", variable has type\n # \"_UFunc_Nin2_Nout1[Literal['logical_and'], Literal[20],\n # Literal[True]]\")\n ufunc = np.logical_or # type: ignore[assignment]\n elif name in (\"sum\", \"mean\"):\n result = None\n ufunc = np.add # type: ignore[assignment]\n elif name == \"prod\":\n result = None\n ufunc = np.multiply # type: ignore[assignment]\n elif name == \"min\":\n result = None\n ufunc = np.fmin if skipna else np.minimum # type: ignore[assignment]\n elif name == \"max\":\n result = None\n ufunc = np.fmax if skipna else np.maximum # type: ignore[assignment]\n else:\n raise NotImplementedError(name)\n\n for block in self._mgr.blocks:\n vals = block.values\n if name in (\"min\", \"max\"):\n middle = ufunc.reduce(vals, axis=0) # type: ignore[arg-type]\n elif name == \"mean\":\n middle = nanops.nansum(vals, axis=0, skipna=skipna, min_count=0) # type: ignore[arg-type]\n elif name in (\"sum\", \"prod\"):\n # min_count=0 here so each block produces a result;\n # the actual min_count threshold is applied across\n # all blocks after the loop.\n middle = func(vals, axis=0, skipna=skipna, min_count=0)\n else:\n middle = func(vals, axis=0, skipna=skipna)\n if result is None:\n result = middle.copy()\n else:\n result = ufunc(result, middle)\n\n # Handle min_count for sum/prod, and compute mean from sum/count\n if name in (\"sum\", \"prod\", \"mean\"):\n if (min_count > 0 or name == \"mean\") and result is not None:\n non_null_count = np.zeros(len(self), dtype=np.intp)\n for block in self._mgr.blocks:\n vals = block.values\n if vals.dtype.kind in \"biu\":\n # bool/int/uint cannot have NaN\n non_null_count += vals.shape[0]\n else:\n non_null_count += vals.shape[0] - isna(vals).sum(axis=0)\n if name == \"mean\":\n null_mask = non_null_count == 0\n result = result.astype(\"float64\")\n result[~null_mask] /= non_null_count[~null_mask]\n result[null_mask] = np.nan\n else:\n null_mask = non_null_count < min_count\n if null_mask.any():\n if result.dtype.kind not in \"fc\":\n result = result.astype(\"float64\")\n result[null_mask] = np.nan\n\n assert result is not None\n res_ser = self._constructor_sliced(result, index=self.index, copy=False)\n return res_ser\n\n # error: Signature of \"any\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def any(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def any(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def any(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n def any(\n self,\n *,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether any element is True, potentially over an axis.\n\n Returns False unless there is at least one element within a series or\n along a Dataframe axis that is True or equivalent (e.g. non-zero or\n non-empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be False, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n numpy.any : Numpy version of this method.\n Series.any : Return whether any element is True.\n Series.all : Return whether all elements are True.\n DataFrame.any : Return whether any element is True over requested axis.\n DataFrame.all : Return whether all elements are True over requested axis.\n\n Examples\n --------\n **Series**\n\n For Series input, the output is a scalar indicating whether any element\n is True.\n\n >>> pd.Series([False, False]).any()\n False\n >>> pd.Series([True, False]).any()\n True\n >>> pd.Series([], dtype=\"float64\").any()\n False\n >>> pd.Series([np.nan]).any()\n False\n >>> pd.Series([np.nan]).any(skipna=False)\n True\n\n **DataFrame**\n\n Whether each column contains at least one True element (the default).\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0, 2], \"C\": [0, 0]})\n >>> df\n A B C\n 0 1 0 0\n 1 2 2 0\n\n >>> df.any()\n A True\n B True\n C False\n dtype: bool\n\n Aggregating over the columns.\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 2]})\n >>> df\n A B\n 0 True 1\n 1 False 2\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 True\n dtype: bool\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 0]})\n >>> df\n A B\n 0 True 1\n 1 False 0\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Aggregating over the entire DataFrame with ``axis=None``.\n\n >>> df.any(axis=None)\n True\n\n `any` for an empty DataFrame is an empty Series.\n\n >>> pd.DataFrame([]).any()\n Series([], dtype: bool)\n \"\"\"\n result = self._logical_func(\n \"any\", nanops.nanany, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"any\")\n return result\n\n @overload\n def all(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def all(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def all(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"all\")\n def all(\n self,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether all elements are True, potentially over an axis.\n\n Returns True unless there at least one element within a series or\n along a Dataframe axis that is False or equivalent (e.g. zero or\n empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be True, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n Series.all : Return True if all elements are True.\n DataFrame.any : Return True if one (or more) elements are True.\n\n Examples\n --------\n **Series**\n\n >>> pd.Series([True, True]).all()\n True\n >>> pd.Series([True, False]).all()\n False\n >>> pd.Series([], dtype=\"float64\").all()\n True\n >>> pd.Series([np.nan]).all()\n True\n >>> pd.Series([np.nan]).all(skipna=False)\n True\n\n **DataFrames**\n\n Create a DataFrame from a dictionary.\n\n >>> df = pd.DataFrame({\"col1\": [True, True], \"col2\": [True, False]})\n >>> df\n col1 col2\n 0 True True\n 1 True False\n\n Default behaviour checks if values in each column all return True.\n\n >>> df.all()\n col1 True\n col2 False\n dtype: bool\n\n Specify ``axis='columns'`` to check if values in each row all return True.\n\n >>> df.all(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Or ``axis=None`` for whether every value is True.\n\n >>> df.all(axis=None)\n False\n \"\"\"\n result = self._logical_func(\n \"all\", nanops.nanall, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"all\")\n return result\n\n # error: Signature of \"min\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def min(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def min(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def min(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"min\")\n def min(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the minimum of the values over the requested axis.\n\n If you want the *index* of the minimum, use ``idxmin``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmin``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.min()\n 0\n \"\"\"\n result = super().min(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"min\")\n return result\n\n # error: Signature of \"max\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def max(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def max(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def max(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"max\")\n def max(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the maximum of the values over the requested axis.\n\n If you want the *index* of the maximum, use ``idxmax``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmax``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.max()\n 8\n \"\"\"\n result = super().max(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"max\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sum\")\n def sum(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the sum of the values over the requested axis.\n\n This is equivalent to the method ``numpy.sum``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sum with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Sum over requested axis.\n\n See Also\n --------\n Series.sum : Return the sum over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.std : Return the standard deviation of the values over the\n requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.sum()\n 14\n\n By default, the sum of an empty or all-NA Series is ``0``.\n\n >>> pd.Series([], dtype=\"float64\").sum() # min_count=0 is the default\n 0.0\n\n This can be controlled with the ``min_count`` parameter. For example, if\n you'd like the sum of an empty series to be NaN, pass ``min_count=1``.\n\n >>> pd.Series([], dtype=\"float64\").sum(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).sum()\n 0.0\n\n >>> pd.Series([np.nan]).sum(min_count=1)\n nan\n \"\"\"\n result = super().sum(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sum\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"prod\")\n def prod(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the product of the values over the requested axis.\n\n This multiplies all values in each column (or row when\n ``axis=1``) together, skipping missing values by default.\n An empty or all-NA column returns ``1`` unless ``min_count``\n is specified.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.prod with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n The product of the values over the requested axis.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n By default, the product of an empty or all-NA Series is ``1``\n\n >>> pd.Series([], dtype=\"float64\").prod()\n 1.0\n\n This can be controlled with the ``min_count`` parameter\n\n >>> pd.Series([], dtype=\"float64\").prod(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).prod()\n 1.0\n\n >>> pd.Series([np.nan]).prod(min_count=1)\n nan\n \"\"\"\n result = super().prod(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"prod\")\n return result\n\n # error: Signature of \"mean\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def mean(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def mean(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def mean(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"mean\")\n def mean(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the mean of the values over the requested axis.\n\n This computes the arithmetic mean of the values in each column\n (or row when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.mean()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.mean()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.mean(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.mean(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().mean(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"mean\")\n return result\n\n # error: Signature of \"median\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def median(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def median(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def median(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\"], name=\"median\"\n )\n def median(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the median of the values over the requested axis.\n\n This computes the median of the values in each column (or row\n when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.median()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.median()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.median(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.median(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().median(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"median\")\n return result\n\n # error: Signature of \"sem\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sem(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def sem(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def sem(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sem\")\n def sem(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased standard error of the mean over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sem with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series\n Unbiased standard error of the mean over requested axis.\n\n See Also\n --------\n DataFrame.var : Return unbiased variance over requested axis.\n DataFrame.std : Returns sample standard deviation over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> round(s.sem(), 6)\n 0.57735\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.sem()\n a 0.5\n b 0.5\n dtype: float64\n\n Using axis=1\n\n >>> df.sem(axis=1)\n tiger 0.5\n zebra 0.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.sem(numeric_only=True)\n a 0.5\n dtype: float64\n \"\"\"\n result = super().sem(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sem\")\n return result\n\n # error: Signature of \"var\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def var(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def var(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def var(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"var\")\n def var(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased variance over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.var with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series or scalaer\n Unbiased variance over requested axis.\n\n See Also\n --------\n numpy.var : Equivalent function in NumPy.\n Series.var : Return unbiased variance over Series values.\n Series.std : Return standard deviation over Series values.\n DataFrame.std : Return standard deviation of the values over\n the requested axis.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n >>> df.var()\n age 352.916667\n height 0.056367\n dtype: float64\n\n Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:\n\n >>> df.var(ddof=0)\n age 264.687500\n height 0.042275\n dtype: float64\n \"\"\"\n result = super().var(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"var\")\n return result\n\n # error: Signature of \"std\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def std(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def std(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def std(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"std\")\n def std(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return sample standard deviation over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.std with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs : dict\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Standard deviation over requested axis.\n\n See Also\n --------\n Series.std : Return standard deviation over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.sum : Return the sum of the values over the requested axis.\n\n Notes\n -----\n To have the same behaviour as ``numpy.std``, use ``ddof=0`` (instead of\n the default ``ddof=1``) and ``skipna=False``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n The standard deviation of the columns can be found as follows:\n\n >>> df.std()\n age 18.786076\n height 0.237417\n dtype: float64\n\n Alternatively, `ddof=0` can be set to normalize by N instead of N-1:\n\n >>> df.std(ddof=0)\n age 16.269219\n height 0.205609\n dtype: float64\n \"\"\"\n result = super().std(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"std\")\n return result\n\n # error: Signature of \"skew\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def skew(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def skew(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def skew(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"skew\")\n def skew(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased skew over requested axis.\n\n Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased skew over requested axis.\n\n See Also\n --------\n DataFrame.kurt : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.skew()\n 0.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [2, 3, 4], \"c\": [1, 3, 5]},\n ... index=[\"tiger\", \"zebra\", \"cow\"],\n ... )\n >>> df\n a b c\n tiger 1 2 1\n zebra 2 3 3\n cow 3 4 5\n >>> df.skew()\n a 0.0\n b 0.0\n c 0.0\n dtype: float64\n\n Using axis=1\n\n >>> df.skew(axis=1)\n tiger 1.732051\n zebra -1.732051\n cow 0.000000\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [\"T\", \"Z\", \"X\"]}, index=[\"tiger\", \"zebra\", \"cow\"]\n ... )\n >>> df.skew(numeric_only=True)\n a 0.0\n dtype: float64\n \"\"\"\n result = super().skew(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"skew\")\n return result\n\n # error: Signature of \"kurt\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def kurt(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"kurt\")\n def kurt(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased kurtosis over requested axis.\n\n Kurtosis obtained using Fisher's definition of\n kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased kurtosis over requested axis.\n\n See Also\n --------\n DataFrame.kurtosis : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 2, 3], index=[\"cat\", \"dog\", \"dog\", \"mouse\"])\n >>> s\n cat 1\n dog 2\n dog 2\n mouse 3\n dtype: int64\n >>> round(s.kurt(), 6)\n 1.5\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 2, 3], \"b\": [3, 4, 4, 4]},\n ... index=[\"cat\", \"dog\", \"dog\", \"mouse\"],\n ... )\n >>> df\n a b\n cat 1 3\n dog 2 4\n dog 2 4\n mouse 3 4\n >>> round(df.kurt(), 6)\n a 1.5\n b 4.0\n dtype: float64\n\n With axis=None\n\n >>> round(df.kurt(axis=None), 6)\n -0.988693\n\n Using axis=1\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2], \"b\": [3, 4], \"c\": [3, 4], \"d\": [1, 2]},\n ... index=[\"cat\", \"dog\"],\n ... )\n >>> df.kurt(axis=1)\n cat -6.0\n dog -6.0\n dtype: float64\n \"\"\"\n result = super().kurt(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"kurt\")\n return result\n\n # error: Incompatible types in assignment\n kurtosis = kurt # type: ignore[assignment]\n product = prod\n\n def cummin(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative minimum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n minimum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative minimum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.min : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.min : Return the minimum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummin()\n 0 2.0\n 1 NaN\n 2 2.0\n 3 -1.0\n 4 -1.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummin(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the minimum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummin()\n A B\n 0 2.0 1.0\n 1 2.0 NaN\n 2 1.0 0.0\n\n To iterate over columns and find the minimum in each row,\n use ``axis=1``\n\n >>> df.cummin(axis=1)\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummin(data, axis, skipna, *args, **kwargs)\n\n def cummax(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative maximum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n maximum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative maximum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.max : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.max : Return the maximum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummax()\n 0 2.0\n 1 NaN\n 2 5.0\n 3 5.0\n 4 5.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummax(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the maximum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummax()\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 3.0 1.0\n\n To iterate over columns and find the maximum in each row,\n use ``axis=1``\n\n >>> df.cummax(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummax(data, axis, skipna, *args, **kwargs)\n\n def cumsum(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative sum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n sum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative sum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.sum : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.sum : Return the sum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumsum()\n 0 2.0\n 1 NaN\n 2 7.0\n 3 6.0\n 4 6.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumsum(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the sum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumsum()\n A B\n 0 2.0 1.0\n 1 5.0 NaN\n 2 6.0 1.0\n\n To iterate over columns and find the sum in each row,\n use ``axis=1``\n\n >>> df.cumsum(axis=1)\n A B\n 0 2.0 3.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumsum(data, axis, skipna, *args, **kwargs)\n\n def cumprod(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative product over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n product.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative product of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.prod : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.prod : Return the product over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumprod()\n 0 2.0\n 1 NaN\n 2 10.0\n 3 -10.0\n 4 -0.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumprod(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the product\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumprod()\n A B\n 0 2.0 1.0\n 1 6.0 NaN\n 2 6.0 0.0\n\n To iterate over columns and find the product in each row,\n use ``axis=1``\n\n >>> df.cumprod(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumprod(data, axis, skipna, *args, **kwargs)\n\n def nunique(self, axis: Axis = 0, dropna: bool = True) -> Series:\n \"\"\"\n Count number of distinct elements in specified axis.\n\n Return Series with number of distinct elements. Can ignore NaN\n values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for\n column-wise.\n dropna : bool, default True\n Don't include NaN in the counts.\n\n Returns\n -------\n Series\n Series with counts of unique values per row or column, depending on `axis`.\n\n See Also\n --------\n Series.nunique: Method nunique for Series.\n DataFrame.count: Count non-NA cells for each column or row.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [4, 5, 6], \"B\": [4, 1, 1]})\n >>> df.nunique()\n A 3\n B 2\n dtype: int64\n\n >>> df.nunique(axis=1)\n 0 1\n 1 2\n 2 2\n dtype: int64\n \"\"\"\n return self.apply(Series.nunique, axis=axis, dropna=dropna)\n\n def idxmin(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of minimum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of minima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmin : Return index of the minimum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmin``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the minimum value in each column.\n\n >>> df.idxmin()\n consumption Pork\n co2_emissions Wheat Products\n dtype: str\n\n To return the index for the minimum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmin(axis=\"columns\")\n Pork consumption\n Wheat Products co2_emissions\n Beef consumption\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmin, \"argmin\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be np.ndarray since axis is not N\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmin\")\n\n def idxmax(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of maximum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of maxima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmax : Return index of the maximum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmax``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the maximum value in each column.\n\n >>> df.idxmax()\n consumption Wheat Products\n co2_emissions Beef\n dtype: str\n\n To return the index for the maximum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmax(axis=\"columns\")\n Pork co2_emissions\n Wheat Products consumption\n Beef co2_emissions\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmax, \"argmax\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be 1d array since axis is not None\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmax\")\n\n def _get_agg_axis(self, axis_num: int) -> Index:\n \"\"\"\n Let's be explicit about this.\n \"\"\"\n if axis_num == 0:\n return self.columns\n elif axis_num == 1:\n return self.index\n else:\n raise ValueError(f\"Axis must be 0 or 1 (got {axis_num!r})\")\n\n def mode(\n self, axis: Axis = 0, numeric_only: bool = False, dropna: bool = True\n ) -> DataFrame:\n \"\"\"\n Get the mode(s) of each element along the selected axis.\n\n The mode of a set of values is the value that appears most often.\n It can be multiple values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to iterate over while searching for the mode:\n\n * 0 or 'index' : get mode of each column\n * 1 or 'columns' : get mode of each row.\n\n numeric_only : bool, default False\n If True, only apply to numeric columns.\n dropna : bool, default True\n Don't consider counts of NaN/NaT.\n\n Returns\n -------\n DataFrame\n The modes of each column or row.\n\n See Also\n --------\n Series.mode : Return the highest frequency value in a Series.\n Series.value_counts : Return the counts of values in a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"bird\", 2, 2),\n ... (\"mammal\", 4, np.nan),\n ... (\"arthropod\", 8, 0),\n ... (\"bird\", 2, np.nan),\n ... ],\n ... index=(\"falcon\", \"horse\", \"spider\", \"ostrich\"),\n ... columns=(\"species\", \"legs\", \"wings\"),\n ... )\n >>> df\n species legs wings\n falcon bird 2 2.0\n horse mammal 4 NaN\n spider arthropod 8 0.0\n ostrich bird 2 NaN\n\n By default, missing values are not considered, and the mode of wings\n are both 0 and 2. Because the resulting DataFrame has two rows,\n the second row of ``species`` and ``legs`` contains ``NaN``.\n\n >>> df.mode()\n species legs wings\n 0 bird 2.0 0.0\n 1 NaN NaN 2.0\n\n Setting ``dropna=False`` ``NaN`` values are considered and they can be\n the mode (like for wings).\n\n >>> df.mode(dropna=False)\n species legs wings\n 0 bird 2 NaN\n\n Setting ``numeric_only=True``, only the mode of numeric columns is\n computed, and columns of other types are ignored.\n\n >>> df.mode(numeric_only=True)\n legs wings\n 0 2.0 0.0\n 1 NaN 2.0\n\n To compute the mode over columns and not rows, use the axis parameter:\n\n >>> df.mode(axis=\"columns\", numeric_only=True)\n 0 1\n falcon 2.0 NaN\n horse 4.0 NaN\n spider 0.0 8.0\n ostrich 2.0 NaN\n \"\"\"\n data = self if not numeric_only else self._get_numeric_data()\n\n def f(s):\n return s.mode(dropna=dropna)\n\n data = data.apply(f, axis=axis)\n # Ensure index is type stable (should always use int index)\n if data.empty:\n data.index = default_index(0)\n\n return data\n\n @overload\n def quantile(\n self,\n q: float = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series: ...\n\n @overload\n def quantile(\n self,\n q: AnyArrayLike | Sequence[float],\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n @overload\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = 0.5,\n axis: Axis = 0,\n numeric_only: bool = False,\n interpolation: QuantileInterpolation = \"linear\",\n method: Literal[\"single\", \"table\"] = \"single\",\n ) -> Series | DataFrame:\n \"\"\"\n Return values at the given quantile over requested axis.\n\n This method computes the value below which a given proportion of\n observations fall. By default, it computes quantiles column-wise,\n but row-wise computation is also supported via ``axis=1``.\n\n Parameters\n ----------\n q : float or array-like, default 0.5 (50% quantile)\n Value between 0 <= q <= 1, the quantile(s) to compute.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Equals 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}\n This optional parameter specifies the interpolation method to use,\n when the desired quantile lies between two data points `i` and `j`:\n\n * linear: `i + (j - i) * fraction`, where `fraction` is the\n fractional part of the index surrounded by `i` and `j`.\n * lower: `i`.\n * higher: `j`.\n * nearest: `i` or `j` whichever is nearest.\n * midpoint: (`i` + `j`) / 2.\n method : {'single', 'table'}, default 'single'\n Whether to compute quantiles per-column ('single') or over all columns\n ('table'). When 'table', the only allowed interpolation methods are\n 'nearest', 'lower', and 'higher'.\n\n Returns\n -------\n Series or DataFrame\n\n If ``q`` is an array, a DataFrame will be returned where the\n index is ``q``, the columns are the columns of self, and the\n values are the quantiles.\n If ``q`` is a float, a Series will be returned where the\n index is the columns of self and the values are the quantiles.\n\n See Also\n --------\n core.window.rolling.Rolling.quantile: Rolling quantile.\n numpy.percentile: Numpy function to compute the percentile.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=[\"a\", \"b\"]\n ... )\n >>> df.quantile(0.1)\n a 1.3\n b 3.7\n Name: 0.1, dtype: float64\n >>> df.quantile([0.1, 0.5])\n a b\n 0.1 1.3 3.7\n 0.5 2.5 55.0\n\n Specifying `method='table'` will compute the quantile over all columns.\n\n >>> df.quantile(0.1, method=\"table\", interpolation=\"nearest\")\n a 1\n b 1\n Name: 0.1, dtype: int64\n >>> df.quantile([0.1, 0.5], method=\"table\", interpolation=\"nearest\")\n a b\n 0.1 1 1\n 0.5 3 100\n\n Specifying `numeric_only=False` will compute the quantiles for all\n columns.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [1, 2],\n ... \"B\": [pd.Timestamp(\"2010\"), pd.Timestamp(\"2011\")],\n ... \"C\": [pd.Timedelta(\"1 days\"), pd.Timedelta(\"2 days\")],\n ... }\n ... )\n >>> df.quantile(0.5, numeric_only=False)\n A 1.5\n B 2010-07-02 12:00:00\n C 1 days 12:00:00\n Name: 0.5, dtype: object\n \"\"\"\n validate_percentile(q)\n axis = self._get_axis_number(axis)\n\n if not is_list_like(q):\n # BlockManager.quantile expects listlike, so we wrap and unwrap here\n # error: List item 0 has incompatible type \"float | ExtensionArray |\n # ndarray[Any, Any] | Index | Series | Sequence[float]\"; expected \"float\"\n res_df = self.quantile(\n [q], # type: ignore[list-item]\n axis=axis,\n numeric_only=numeric_only,\n interpolation=interpolation,\n method=method,\n )\n if method == \"single\":\n res = res_df.iloc[0]\n else:\n # cannot directly iloc over sparse arrays\n res = res_df.T.iloc[:, 0]\n if axis == 1 and len(self) == 0:\n # GH#41544 try to get an appropriate dtype\n dtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(dtype):\n return res.astype(dtype)\n return res\n\n q = Index(q, dtype=np.float64)\n data = self._get_numeric_data() if numeric_only else self\n\n if axis == 1:\n data = data.T\n\n if len(data.columns) == 0:\n # GH#23925 _get_numeric_data may have dropped all columns\n cols = self.columns[:0]\n\n dtype = np.float64\n if axis == 1:\n # GH#41544 try to get an appropriate dtype\n cdtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(cdtype):\n dtype = cdtype\n\n res = self._constructor([], index=q, columns=cols, dtype=dtype)\n return res.__finalize__(self, method=\"quantile\")\n\n valid_method = {\"single\", \"table\"}\n if method not in valid_method:\n raise ValueError(\n f\"Invalid method: {method}. Method must be in {valid_method}.\"\n )\n if method == \"single\":\n res = data._mgr.quantile(qs=q, interpolation=interpolation)\n elif method == \"table\":\n valid_interpolation = {\"nearest\", \"lower\", \"higher\"}\n if interpolation not in valid_interpolation:\n raise ValueError(\n f\"Invalid interpolation: {interpolation}. \"\n f\"Interpolation must be in {valid_interpolation}\"\n )\n # handle degenerate case\n if len(data) == 0:\n if data.ndim == 2:\n dtype = find_common_type(list(self.dtypes))\n else:\n dtype = self.dtype\n return self._constructor([], index=q, columns=data.columns, dtype=dtype)\n\n q_idx = np.quantile(np.arange(len(data)), q, method=interpolation)\n\n by = data.columns\n if len(by) > 1:\n keys = [data._get_label_or_level_values(x) for x in by]\n indexer = lexsort_indexer(keys)\n else:\n k = data._get_label_or_level_values(by[0])\n indexer = nargsort(k)\n\n res = data._mgr.take(indexer[q_idx], verify=False)\n res.axes[1] = q\n\n result = self._constructor_from_mgr(res, axes=res.axes)\n return result.__finalize__(self, method=\"quantile\")\n\n def to_timestamp(\n self,\n freq: Frequency | None = None,\n how: ToTimestampHow = \"start\",\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Cast PeriodIndex to DatetimeIndex of timestamps, at *beginning* of period.\n\n This can be changed to the *end* of the period, by specifying `how=\"e\"`.\n\n Parameters\n ----------\n freq : str, default frequency of PeriodIndex\n Desired frequency.\n how : {'s', 'e', 'start', 'end'}\n Convention for converting period to timestamp; start of period\n vs. end.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame with DatetimeIndex\n DataFrame with the PeriodIndex cast to DatetimeIndex.\n\n See Also\n --------\n DataFrame.to_period: Inverse method to cast DatetimeIndex to PeriodIndex.\n Series.to_timestamp: Equivalent method for Series.\n\n Examples\n --------\n >>> idx = pd.PeriodIndex([\"2023\", \"2024\"], freq=\"Y\")\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d, index=idx)\n >>> df1\n col1 col2\n 2023 1 3\n 2024\t 2 4\n\n The resulting timestamps will be at the beginning of the year in this case\n\n >>> df1 = df1.to_timestamp()\n >>> df1\n col1 col2\n 2023-01-01 1 3\n 2024-01-01 2 4\n >>> df1.index\n DatetimeIndex(['2023-01-01', '2024-01-01'], dtype='datetime64[us]', freq=None)\n\n Using `freq` which is the offset that the Timestamps will have\n\n >>> df2 = pd.DataFrame(data=d, index=idx)\n >>> df2 = df2.to_timestamp(freq=\"M\")\n >>> df2\n col1 col2\n 2023-01-31 1 3\n 2024-01-31 2 4\n >>> df2.index\n DatetimeIndex(['2023-01-31', '2024-01-31'], dtype='datetime64[us]', freq=None)\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, PeriodIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_timestamp(freq=freq, how=how)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def to_period(\n self,\n freq: Frequency | None = None,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Convert DataFrame from DatetimeIndex to PeriodIndex.\n\n Convert DataFrame from DatetimeIndex to PeriodIndex with desired\n frequency (inferred from index if not passed). Either index of columns can be\n converted, depending on `axis` argument.\n\n Parameters\n ----------\n freq : str, default\n Frequency of the PeriodIndex.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The DataFrame with the converted PeriodIndex.\n\n See Also\n --------\n Series.to_period: Equivalent method for Series.\n Series.dt.to_period: Convert DateTime column values.\n\n Examples\n --------\n >>> idx = pd.to_datetime(\n ... [\n ... \"2001-03-31 00:00:00\",\n ... \"2002-05-31 00:00:00\",\n ... \"2003-08-31 00:00:00\",\n ... ]\n ... )\n\n >>> idx\n DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],\n dtype='datetime64[us]', freq=None)\n\n >>> idx.to_period(\"M\")\n PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')\n\n For the yearly frequency\n\n >>> idx.to_period(\"Y\")\n PeriodIndex(['2001', '2002', '2003'], dtype='period[Y-DEC]')\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, DatetimeIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_period(freq=freq)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def isin(self, values: Series | DataFrame | Sequence | Mapping) -> DataFrame:\n \"\"\"\n Whether each element in the DataFrame is contained in values.\n\n Returns a DataFrame of the same shape with boolean values: True\n where the element is in the corresponding structure of\n ``values``, False otherwise. ``values`` can be a list, dict,\n Series, or DataFrame; alignment rules depend on its type.\n\n Parameters\n ----------\n values : iterable, Series, DataFrame or dict\n The result will only be true at a location if all the\n labels match. If `values` is a Series, that's the index. If\n `values` is a dict, the keys must be the column names,\n which must match. If `values` is a DataFrame,\n then both the index and column labels must match.\n\n Returns\n -------\n DataFrame\n DataFrame of booleans showing whether each element in the DataFrame\n is contained in values.\n\n See Also\n --------\n DataFrame.eq: Equality test for DataFrame.\n Series.isin: Equivalent method on Series.\n Series.str.contains: Test if pattern or regex is contained within a\n string of a Series or Index.\n\n Notes\n -----\n ``__iter__`` is used (and not ``__contains__``) to iterate over values\n when checking if it contains the elements in DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4], \"num_wings\": [2, 0]}, index=[\"falcon\", \"dog\"]\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n\n When ``values`` is a list check whether every value in the DataFrame\n is present in the list (which animals have 0 or 2 legs or wings)\n\n >>> df.isin([0, 2])\n num_legs num_wings\n falcon True True\n dog False True\n\n To check if ``values`` is *not* in the DataFrame, use the ``~`` operator:\n\n >>> ~df.isin([0, 2])\n num_legs num_wings\n falcon False False\n dog True False\n\n When ``values`` is a dict, we can pass values to check for each\n column separately:\n\n >>> df.isin({\"num_wings\": [0, 3]})\n num_legs num_wings\n falcon False False\n dog False True\n\n When ``values`` is a Series or DataFrame the index and column must\n match. Note that 'falcon' does not match based on the number of legs\n in other.\n\n >>> other = pd.DataFrame(\n ... {\"num_legs\": [8, 3], \"num_wings\": [0, 2]}, index=[\"spider\", \"falcon\"]\n ... )\n >>> df.isin(other)\n num_legs num_wings\n falcon False True\n dog False False\n \"\"\"\n if isinstance(values, dict):\n from pandas.core.reshape.concat import concat\n\n values = collections.defaultdict(list, values)\n result = concat(\n (\n self.iloc[:, [i]].isin(values[col])\n for i, col in enumerate(self.columns)\n ),\n axis=1,\n )\n elif isinstance(values, Series):\n if not values.index.is_unique:\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self), axis=\"index\")\n elif isinstance(values, DataFrame):\n if not (values.columns.is_unique and values.index.is_unique):\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self))\n else:\n if not is_list_like(values):\n raise TypeError(\n \"only list-like or dict-like objects are allowed \"\n \"to be passed to DataFrame.isin(), \"\n f\"you passed a '{type(values).__name__}'\"\n )\n\n def isin_(x):\n # error: Argument 2 to \"isin\" has incompatible type \"Union[Series,\n # DataFrame, Sequence[Any], Mapping[Any, Any]]\"; expected\n # \"Union[Union[Union[ExtensionArray, ndarray[Any, Any]], Index,\n # Series], List[Any], range]\"\n result = algorithms.isin(\n x.ravel(),\n values, # type: ignore[arg-type]\n )\n return result.reshape(x.shape)\n\n res_mgr = self._mgr.apply(isin_)\n result = self._constructor_from_mgr(\n res_mgr,\n axes=res_mgr.axes,\n )\n return result.__finalize__(self, method=\"isin\")\n\n # ----------------------------------------------------------------------\n # Add index and columns\n _AXIS_ORDERS: list[Literal[\"index\", \"columns\"]] = [\"index\", \"columns\"]\n _AXIS_TO_AXIS_NUMBER: dict[Axis, int] = {\n **NDFrame._AXIS_TO_AXIS_NUMBER,\n 1: 1,\n \"columns\": 1,\n }\n _AXIS_LEN = len(_AXIS_ORDERS)\n _info_axis_number: Literal[1] = 1\n _info_axis_name: Literal[\"columns\"] = \"columns\"\n\n index = properties.AxisProperty(\n axis=1,\n doc=\"\"\"\n The index (row labels) of the DataFrame.\n\n The index of a DataFrame is a series of labels that identify each row.\n The labels can be integers, strings, or any other hashable type. The index\n is used for label-based access and alignment, and can be accessed or\n modified using this attribute.\n\n Returns\n -------\n pandas.Index\n The index labels of the DataFrame.\n\n See Also\n --------\n DataFrame.columns : The column labels of the DataFrame.\n DataFrame.to_numpy : Convert the DataFrame to a NumPy array.\n\n Examples\n --------\n >>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],\n ... 'Age': [25, 30, 35],\n ... 'Location': ['Seattle', 'New York', 'Kona']},\n ... index=([10, 20, 30]))\n >>> df.index\n Index([10, 20, 30], dtype='int64')\n\n In this example, we create a DataFrame with 3 rows and 3 columns,\n including Name, Age, and Location information. We set the index labels to\n be the integers 10, 20, and 30. We then access the `index` attribute of the\n DataFrame, which returns an `Index` object containing the index labels.\n\n >>> df.index = [100, 200, 300]\n >>> df\n Name Age Location\n 100 Alice 25 Seattle\n 200 Bob 30 New York\n 300 Aritra 35 Kona\n\n In this example, we modify the index labels of the DataFrame by assigning\n a new list of labels to the `index` attribute. The DataFrame is then\n updated with the new labels, and the output shows the modified DataFrame.\n \"\"\",\n )\n columns = properties.AxisProperty(\n axis=0,\n doc=\"\"\"\n The column labels of the DataFrame.\n\n This property holds the column names as a pandas ``Index`` object.\n It provides an immutable sequence of column labels that can be\n used for data selection, renaming, and alignment in DataFrame operations.\n\n Returns\n -------\n pandas.Index\n The column labels of the DataFrame.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.axes: Return a list representing the axes of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})\n >>> df\n A B\n 0 1 3\n 1 2 4\n >>> df.columns\n Index(['A', 'B'], dtype='str')\n \"\"\",\n )\n\n # ----------------------------------------------------------------------\n # Add plotting methods to DataFrame\n plot = Accessor(\"plot\", pandas.plotting.PlotAccessor)\n hist = pandas.plotting.hist_frame\n boxplot = pandas.plotting.boxplot_frame\n sparse = Accessor(\"sparse\", SparseFrameAccessor)\n\n # ----------------------------------------------------------------------\n # Internal Interface Methods\n\n\n @property\n def values(self) -> np.ndarray:\n \"\"\"\n Return a Numpy representation of the DataFrame.\n\n .. warning::\n\n We recommend using :meth:`DataFrame.to_numpy` instead.\n ``.values`` offers no way to control the output ``dtype``, copy\n semantics, or the value used to fill missing entries, while\n :meth:`DataFrame.to_numpy` exposes those as the ``dtype``,\n ``copy``, and ``na_value`` arguments. The mutability of the\n result also depends on the DataFrame's internal block layout:\n when the DataFrame is backed by a single block the result is a\n read-only view (writes raise); when there are multiple blocks\n the result is a writable copy whose mutations do not propagate\n back to the DataFrame.\n\n Only the values in the DataFrame will be returned, the axes labels\n will be removed.\n\n Returns\n -------\n numpy.ndarray\n The values of the DataFrame.\n\n See Also\n --------\n DataFrame.to_numpy : Recommended alternative to this method.\n DataFrame.index : Retrieve the index labels.\n DataFrame.columns : Retrieving the column names.\n\n Notes\n -----\n The returned array is not intended to be written to. When the\n DataFrame is backed by a single NumPy array (single dtype, single\n block), the result is a read-only view; when the DataFrame has\n multiple internal blocks (e.g. after adding a new column), the\n result is a copy and modifications to it will not be reflected in\n the original DataFrame. Use :meth:`DataFrame.to_numpy` for more\n explicit control over copy behavior, or use :attr:`DataFrame.iloc`\n to modify values in-place.\n\n The dtype will be a lower-common-denominator dtype (implicit\n upcasting); that is to say if the dtypes (even of numeric types)\n are mixed, the one that accommodates all will be chosen. Use this\n with care if you are not dealing with the blocks.\n\n e.g. If the dtypes are float16 and float32, dtype will be upcast to\n float32. If dtypes are int32 and uint8, dtype will be upcast to\n int32. By :func:`numpy.find_common_type` convention, mixing int64\n and uint64 will result in a float64 dtype.\n\n Examples\n --------\n A DataFrame where all columns are the same type (e.g., int64) results\n in an array of the same type.\n\n >>> df = pd.DataFrame(\n ... {\"age\": [3, 29], \"height\": [94, 170], \"weight\": [31, 115]}\n ... )\n >>> df\n age height weight\n 0 3 94 31\n 1 29 170 115\n >>> df.dtypes\n age int64\n height int64\n weight int64\n dtype: object\n >>> df.values\n array([[ 3, 94, 31],\n [ 29, 170, 115]])\n\n A DataFrame with mixed type columns(e.g., str/object, int64, float32)\n results in an ndarray of the broadest type that accommodates these\n mixed types (e.g., object).\n\n >>> df2 = pd.DataFrame(\n ... [\n ... (\"parrot\", 24.0, \"second\"),\n ... (\"lion\", 80.5, 1),\n ... (\"monkey\", np.nan, None),\n ... ],\n ... columns=(\"name\", \"max_speed\", \"rank\"),\n ... )\n >>> df2.dtypes\n name str\n max_speed float64\n rank object\n dtype: object\n >>> df2.values\n array([['parrot', 24.0, 'second'],\n ['lion', 80.5, 1],\n ['monkey', nan, None]], dtype=object)\n\n ``DataFrame.to_numpy`` produces the same array by default, but lets\n you choose how missing values are represented and request a\n guaranteed copy:\n\n >>> df3 = pd.DataFrame({\"a\": [1, 2], \"b\": [3.0, np.nan]})\n >>> df3.values\n array([[ 1., 3.],\n [ 2., nan]])\n >>> df3.to_numpy(na_value=-1)\n array([[ 1., 3.],\n [ 2., -1.]])\n >>> df3.to_numpy(dtype=\"float32\", copy=True)\n array([[ 1., 3.],\n [ 2., nan]], dtype=float32)\n \"\"\"\n return self._mgr.as_array()\n\n\ndef _from_nested_dict(\n data: Mapping[HashableT, Mapping[HashableT2, T]],\n) -> collections.defaultdict[HashableT2, dict[HashableT, T]]:\n new_data: collections.defaultdict[HashableT2, dict[HashableT, T]] = (\n collections.defaultdict(dict)\n )\n for index, s in data.items():\n for col, v in s.items():\n new_data[col][index] = v\n return new_data\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n \ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockV\n```\n
","tags":["context-window","function-retrieval","python","late","256k"],"expected_answer":["def _reindex_for_setitem(","if value.index.equals(index) or not len(index):","if isinstance(value, Series):","return value._values, value._references"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":256000,"function_name":"_reindex_for_setitem","function_position":"late","evaluation_mode":"function_required_terms","expected_full_answer":"def _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None"}} +{"id":"function-two-blocks-256k","system_prompt":"You are a strict code retrieval engine. Return only the requested code blocks or NOT_FOUND.","prompt":"Context-window function retrieval item: function-two-blocks-256k\nApproximate target context: 256000 tokens.\nReturn the complete source code for `_construct_result` first, then a blank line, then the complete source code for `_to_dict_of_blocks`. For each function, include only its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_colu\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n # test_append_empty_frame_to_series_with_dateutil_tz\n row_df = row_df.infer_objects().rename_axis(index.names)\n\n if len(row_df.columns) == len(self.columns):\n # Pre-cast the row's value to the original column dtype where the\n # row's inferred dtype would otherwise force concat to widen the\n # whole column. This avoids an O(N) materialize-and-rebuild\n # roundtrip in _post_expansion_casting, and (for EA dtypes that\n # carry array-level state not encoded in the dtype, e.g. geopandas\n # CRS) preserves that state through concat. GH#65094.\n orig_dtypes = self._mgr.get_dtypes()\n row_dtypes = row_df._mgr.get_dtypes()\n object_dtype = np.dtype(object)\n for i in range(len(self.columns)):\n orig_dtype = orig_dtypes[i]\n if row_dtypes[i] == orig_dtype:\n continue\n if orig_dtype == object_dtype:\n # concat object + anything stays object; post-cast is a\n # no-op, so pre-casting would only add overhead.\n continue\n arr = self._get_column_array(i)\n if isinstance(arr, np.ndarray):\n # infer_and_maybe_downcast expects an EA as its first\n # argument so it can dispatch to _cast_pointwise_result.\n arr = NumpyExtensionArray(arr)\n casted = infer_and_maybe_downcast(arr, row_df._mgr.iget_values(i))\n row_df.isetitem(i, casted)\n\n from pandas.core.reshape.concat import concat\n\n result = concat(\n [self, row_df],\n ignore_index=ignore_index,\n )\n return result.__finalize__(self, method=\"append\")\n\n def join(\n self,\n other: DataFrame | Series | Iterable[DataFrame | Series],\n on: IndexLabel | None = None,\n how: MergeHow = \"left\",\n lsuffix: str = \"\",\n rsuffix: str = \"\",\n sort: bool = False,\n validate: JoinValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Join columns of another DataFrame.\n\n Join columns with `other` DataFrame either on index or on a key\n column. Efficiently join multiple DataFrame objects by index at once by\n passing a list.\n\n Parameters\n ----------\n other : DataFrame, Series, or a list containing any combination of them\n Index should be similar to one of the columns in the caller. If a\n Series is passed, its name attribute must be set, and that will be\n used as the column name in the resulting joined DataFrame.\n on : str, list of str, or array-like, optional\n Column or index level name(s) in the caller to join on the index\n in `other`, otherwise joins index-on-index. If multiple\n values given, the `other` DataFrame must have a MultiIndex. Can\n pass an array as the join key if it is not already contained in\n the calling DataFrame. Like an Excel VLOOKUP operation.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'left'\n How to handle the operation of the two objects.\n\n * left: use calling frame's index (or column if on is specified)\n * right: use `other`'s index.\n * outer: form union of calling frame's index (or column if on is\n specified) with `other`'s index, and sort it lexicographically.\n * inner: form intersection of calling frame's index (or column if\n on is specified) with `other`'s index, preserving the order\n of the calling's one.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use set difference of calling frame's index and `other`'s\n index.\n * right_anti: use set difference of `other`'s index and calling frame's\n index.\n lsuffix : str, default ''\n Suffix to use from left frame's overlapping columns.\n rsuffix : str, default ''\n Suffix to use from right frame's overlapping columns.\n sort : bool, default False\n Order result DataFrame lexicographically by the join key. If False,\n the order of the join key depends on the join type (how keyword).\n validate : str, optional\n If specified, checks if join is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if join keys are unique in both left\n and right datasets.\n * \"one_to_many\" or \"1:m\": check if join keys are unique in left dataset.\n * \"many_to_one\" or \"m:1\": check if join keys are unique in right dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A dataframe containing columns from both the caller and `other`.\n\n See Also\n --------\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n Parameters `on`, `lsuffix`, and `rsuffix` are not supported when\n passing a list of `DataFrame` objects.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K2\", \"K3\", \"K4\", \"K5\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K2 A2\n 3 K3 A3\n 4 K4 A4\n 5 K5 A5\n\n >>> other = pd.DataFrame({\"key\": [\"K0\", \"K1\", \"K2\"], \"B\": [\"B0\", \"B1\", \"B2\"]})\n\n >>> other\n key B\n 0 K0 B0\n 1 K1 B1\n 2 K2 B2\n\n Join DataFrames using their indexes.\n\n >>> df.join(other, lsuffix=\"_caller\", rsuffix=\"_other\")\n key_caller A key_other B\n 0 K0 A0 K0 B0\n 1 K1 A1 K1 B1\n 2 K2 A2 K2 B2\n 3 K3 A3 NaN NaN\n 4 K4 A4 NaN NaN\n 5 K5 A5 NaN NaN\n\n If we want to join using the key columns, we need to set key to be\n the index in both `df` and `other`. The joined DataFrame will have\n key as its index.\n\n >>> df.set_index(\"key\").join(other.set_index(\"key\"))\n A B\n key\n K0 A0 B0\n K1 A1 B1\n K2 A2 B2\n K3 A3 NaN\n K4 A4 NaN\n K5 A5 NaN\n\n Another option to join using the key columns is to use the `on`\n parameter. DataFrame.join always uses `other`'s index but we can use\n any column in `df`. This method preserves the original DataFrame's\n index in the result.\n\n >>> df.join(other.set_index(\"key\"), on=\"key\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K2 A2 B2\n 3 K3 A3 NaN\n 4 K4 A4 NaN\n 5 K5 A5 NaN\n\n Using non-unique key values shows how they are matched.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K1\", \"K3\", \"K0\", \"K1\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K1 A2\n 3 K3 A3\n 4 K0 A4\n 5 K1 A5\n\n >>> df.join(other.set_index(\"key\"), on=\"key\", validate=\"m:1\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K1 A2 B1\n 3 K3 A3 NaN\n 4 K0 A4 B0\n 5 K1 A5 B1\n \"\"\"\n from pandas.core.reshape.concat import concat\n from pandas.core.reshape.merge import merge\n\n if isinstance(other, Series):\n if other.name is None:\n raise ValueError(\"Other Series must have a name\")\n other = DataFrame({other.name: other})\n\n if isinstance(other, DataFrame):\n if how == \"cross\":\n return merge(\n self,\n other,\n how=how,\n on=on,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n return merge(\n self,\n other,\n left_on=on,\n how=how,\n left_index=on is None,\n right_index=True,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n else:\n if on is not None:\n raise ValueError(\n \"Joining multiple DataFrames only supported for joining on index\"\n )\n\n if rsuffix or lsuffix:\n raise ValueError(\n \"Suffixes not supported when joining multiple DataFrames\"\n )\n\n # Mypy thinks the RHS is a\n # \"Union[DataFrame, Series, Iterable[Union[DataFrame, Series]]]\" whereas\n # the LHS is an \"Iterable[DataFrame]\", but in reality both types are\n # \"Iterable[Union[DataFrame, Series]]\" due to the if statements\n frames = [cast(\"DataFrame | Series\", self), *list(other)]\n\n can_concat = all(df.index.is_unique for df in frames)\n\n # join indexes only using concat\n if can_concat:\n if how in {\"left\", \"right\"}:\n res = concat(\n frames, axis=1, join=\"outer\", verify_integrity=True, sort=sort\n )\n index = self.index if how == \"left\" else frames[-1].index\n if sort:\n index = index.sort_values()\n result = res.reindex(index)\n return result\n else:\n if how == \"outer\":\n sort = True\n return concat(\n frames, axis=1, join=how, verify_integrity=True, sort=sort\n )\n\n joined = frames[0]\n\n for frame in frames[1:]:\n joined = merge(\n joined,\n frame,\n sort=sort,\n how=how,\n left_index=True,\n right_index=True,\n validate=validate,\n )\n\n return joined\n\n def merge(\n self,\n right: DataFrame | Series,\n how: MergeHow = \"inner\",\n on: IndexLabel | AnyArrayLike | None = None,\n left_on: IndexLabel | AnyArrayLike | None = None,\n right_on: IndexLabel | AnyArrayLike | None = None,\n left_index: bool = False,\n right_index: bool = False,\n sort: bool = False,\n suffixes: Suffixes = (\"_x\", \"_y\"),\n copy: bool | lib.NoDefault = lib.no_default,\n indicator: str | bool = False,\n validate: MergeValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Merge DataFrame or named Series objects with a database-style join.\n\n A named Series object is treated as a DataFrame with a single named column.\n\n The join is done on columns or indexes. If joining columns on\n columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes\n on indexes or indexes on a column or columns, the index will be passed on.\n When performing a cross merge, no column specifications to merge on are\n allowed.\n\n .. warning::\n\n If both key columns contain rows where the key is a null value, those\n rows will be matched against each other. This is different from usual SQL\n join behaviour and can lead to unexpected results.\n\n Parameters\n ----------\n right : DataFrame or named Series\n Object to merge with.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'inner'\n Type of merge to be performed.\n\n * left: use only keys from left frame, similar to a SQL left outer join;\n preserve key order.\n * right: use only keys from right frame, similar to a SQL right outer join;\n preserve key order.\n * outer: use union of keys from both frames, similar to a SQL full outer\n join; sort keys lexicographically.\n * inner: use intersection of keys from both frames, similar to a SQL inner\n join; preserve the order of the left keys.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use only keys from left frame that are not in right frame,\n similar to SQL left anti join; preserve key order.\n\n .. versionadded:: 3.0\n * right_anti: use only keys from right frame that are not in left frame,\n similar to SQL right anti join; preserve key order.\n\n .. versionadded:: 3.0\n on : Hashable or a sequence of the previous\n Column or index level names to join on. These must be found in both\n DataFrames. If `on` is None and not merging on indexes then this defaults\n to the intersection of the columns in both DataFrames.\n left_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the left DataFrame. Can also\n be an array or list of arrays of the length of the left DataFrame.\n These arrays are treated as if they are columns.\n right_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the right DataFrame. Can also\n be an array or list of arrays of the length of the right DataFrame.\n These arrays are treated as if they are columns.\n left_index : bool, default False\n Use the index from the left DataFrame as the join key(s). If it is a\n MultiIndex, the number of keys in the other DataFrame (either the index\n or a number of columns) must match the number of levels.\n right_index : bool, default False\n Use the index from the right DataFrame as the join key. Same caveats as\n left_index.\n sort : bool, default False\n Sort the join keys lexicographically in the result DataFrame. If False,\n the order of the join keys depends on the join type (how keyword).\n suffixes : list-like, default is (\"_x\", \"_y\")\n A length-2 sequence where each element is optionally a string\n indicating the suffix to add to overlapping column names in\n `left` and `right` respectively. Pass a value of `None` instead\n of a string to indicate that the column name from `left` or\n `right` should be left as-is, with no suffix. At least one of the\n values must not be None.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n indicator : bool or str, default False\n If True, adds a column to the output DataFrame called \"_merge\" with\n information on the source of each row. The column can be given a different\n name by providing a string argument. The column will have a Categorical\n type with the value of \"left_only\" for observations whose merge key only\n appears in the left DataFrame, \"right_only\" for observations\n whose merge key only appears in the right DataFrame, and \"both\"\n if the observation's merge key is found in both DataFrames.\n\n validate : str, optional\n If specified, checks if merge is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if merge keys are unique in both\n left and right datasets.\n * \"one_to_many\" or \"1:m\": check if merge keys are unique in left\n dataset.\n * \"many_to_one\" or \"m:1\": check if merge keys are unique in right\n dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A DataFrame of the two merged objects.\n\n See Also\n --------\n merge_ordered : Merge with optional filling/interpolation.\n merge_asof : Merge on nearest keys.\n DataFrame.join : Similar method using indices.\n\n Examples\n --------\n >>> df1 = pd.DataFrame(\n ... {\"lkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [1, 2, 3, 5]}\n ... )\n >>> df2 = pd.DataFrame(\n ... {\"rkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [5, 6, 7, 8]}\n ... )\n >>> df1\n lkey value\n 0 foo 1\n 1 bar 2\n 2 baz 3\n 3 foo 5\n >>> df2\n rkey value\n 0 foo 5\n 1 bar 6\n 2 baz 7\n 3 foo 8\n\n Merge df1 and df2 on the lkey and rkey columns. The value columns have\n the default suffixes, _x and _y, appended.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\")\n lkey value_x rkey value_y\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2 with specified left and right suffixes\n appended to any overlapping columns.\n\n >>> df1.merge(\n ... df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(\"_left\", \"_right\")\n ... )\n lkey value_left rkey value_right\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2, but raise an exception if the DataFrames have\n any overlapping columns.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(False, False))\n Traceback (most recent call last):\n ...\n ValueError: columns overlap but no suffix specified:\n Index(['value'], dtype='object')\n\n >>> df1 = pd.DataFrame({\"a\": [\"foo\", \"bar\"], \"b\": [1, 2]})\n >>> df2 = pd.DataFrame({\"a\": [\"foo\", \"baz\"], \"c\": [3, 4]})\n >>> df1\n a b\n 0 foo 1\n 1 bar 2\n >>> df2\n a c\n 0 foo 3\n 1 baz 4\n\n >>> df1.merge(df2, how=\"inner\", on=\"a\")\n a b c\n 0 foo 1 3\n\n >>> df1.merge(df2, how=\"left\", on=\"a\")\n a b c\n 0 foo 1 3.0\n 1 bar 2 NaN\n\n >>> df1 = pd.DataFrame({\"left\": [\"foo\", \"bar\"]})\n >>> df2 = pd.DataFrame({\"right\": [7, 8]})\n >>> df1\n left\n 0 foo\n 1 bar\n >>> df2\n right\n 0 7\n 1 8\n\n >>> df1.merge(df2, how=\"cross\")\n left right\n 0 foo 7\n 1 foo 8\n 2 bar 7\n 3 bar 8\n \"\"\"\n self._check_copy_deprecation(copy)\n\n from pandas.core.reshape.merge import merge\n\n return merge(\n self,\n right,\n how=how,\n on=on,\n left_on=left_on,\n right_on=right_on,\n left_index=left_index,\n right_index=right_index,\n sort=sort,\n suffixes=suffixes,\n indicator=indicator,\n validate=validate,\n )\n\n def round(\n self, decimals: int | dict[IndexLabel, int] | Series = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Round numeric columns in a DataFrame to a variable number of decimal places.\n\n Each column can be rounded to a different number of decimal places by\n passing a dict or Series mapping column names to the desired precision.\n Non-numeric columns are left unchanged.\n\n Parameters\n ----------\n decimals : int, dict, Series\n Number of decimal places to round each column to. If an int is\n given, round each column to the same number of places.\n Otherwise dict and Series round to variable numbers of places.\n Column names should be in the keys if `decimals` is a\n dict-like, or in the index if `decimals` is a Series. Any\n columns not included in `decimals` will be left as is. Elements\n of `decimals` which are not columns of the input will be\n ignored.\n *args\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n\n Returns\n -------\n DataFrame\n A DataFrame with the affected columns rounded to the specified\n number of decimal places.\n\n See Also\n --------\n numpy.around : Round a numpy array to the given number of decimals.\n Series.round : Round a Series to the given number of decimals.\n\n Notes\n -----\n For values exactly halfway between rounded decimal values, pandas rounds\n to the nearest even value (e.g. -0.5 and 0.5 round to 0.0, 1.5 and 2.5\n round to 2.0, etc.).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(0.21, 0.32), (0.01, 0.67), (0.66, 0.03), (0.21, 0.18)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df\n dogs cats\n 0 0.21 0.32\n 1 0.01 0.67\n 2 0.66 0.03\n 3 0.21 0.18\n\n By providing an integer each column is rounded to the same number\n of decimal places\n\n >>> df.round(1)\n dogs cats\n 0 0.2 0.3\n 1 0.0 0.7\n 2 0.7 0.0\n 3 0.2 0.2\n\n With a dict, the number of places for specific columns can be\n specified with the column names as key and the number of decimal\n places as value\n\n >>> df.round({\"dogs\": 1, \"cats\": 0})\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n\n Using a Series, the number of places for specific columns can be\n specified with the column names as index and the number of\n decimal places as value\n\n >>> decimals = pd.Series([0, 1], index=[\"cats\", \"dogs\"])\n >>> df.round(decimals)\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n \"\"\"\n from pandas.core.reshape.concat import concat\n\n def _dict_round(df: DataFrame, decimals) -> Iterator[Series]:\n for col, vals in df.items():\n try:\n yield _series_round(vals, decimals[col])\n except KeyError:\n yield vals\n\n def _series_round(ser: Series, decimals: int) -> Series:\n if is_integer_dtype(ser.dtype) or is_float_dtype(ser.dtype):\n return ser.round(decimals)\n elif isinstance(ser._values, (DatetimeArray, TimedeltaArray, PeriodArray)):\n # GH#57781\n # TODO: also the ArrowDtype analogues?\n warnings.warn(\n \"obj.round has no effect with datetime, timedelta, \"\n \"or period dtypes. Use obj.dt.round(...) instead.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n return ser\n\n nv.validate_round(args, kwargs)\n\n if isinstance(decimals, (dict, Series)):\n if isinstance(decimals, Series) and not decimals.index.is_unique:\n raise ValueError(\"Index of decimals must be unique\")\n if is_dict_like(decimals) and not all(\n is_integer(value) for _, value in decimals.items()\n ):\n raise TypeError(\"Values in decimals must be integers\")\n new_cols = list(_dict_round(self, decimals))\n elif is_integer(decimals):\n # Dispatch to Block.round\n # Argument \"decimals\" to \"round\" of \"BaseBlockManager\" has incompatible\n # type \"Union[int, integer[Any]]\"; expected \"int\"\n new_mgr = self._mgr.round(\n decimals=decimals, # type: ignore[arg-type]\n )\n return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(\n self, method=\"round\"\n )\n else:\n raise TypeError(\"decimals must be an integer, a dict-like or a Series\")\n\n if new_cols is not None and len(new_cols) > 0:\n return self._constructor(\n concat(new_cols, axis=1), index=self.index, columns=self.columns\n ).__finalize__(self, method=\"round\")\n else:\n return self.copy(deep=False)\n\n # ----------------------------------------------------------------------\n # Statistical methods, etc.\n\n def describe(\n self,\n percentiles=None,\n include=None,\n exclude=None,\n ) -> DataFrame:\n \"\"\"\n Generate descriptive statistics.\n\n Summarize the central tendency, dispersion, and shape of each\n analyzed column's distribution, excluding ``NaN`` values. By\n default only numeric columns are analyzed; pass ``include`` to\n also analyze non-numeric columns (or ``exclude`` to omit columns\n by dtype).\n\n Parameters\n ----------\n percentiles : list-like of numbers, optional\n The percentiles to include in the output. All should fall\n between 0 and 1. The default, ``None``, returns the 25th,\n 50th, and 75th percentiles.\n include : 'all', list-like of dtypes or None (default), optional\n Which column dtypes to include. Options:\n\n - ``'all'`` : Include all columns, including non-numeric ones.\n - list-like of dtypes : Limit the result to columns of the\n given dtypes, in the style of\n :meth:`DataFrame.select_dtypes` (e.g. ``include=[np.number]``\n or ``include=[\"category\"]``).\n - ``None`` (default) : Include only numeric columns, falling\n back to object and categorical columns if there are no\n numeric columns.\n exclude : list-like of dtypes or None (default), optional\n Column dtypes to omit from the result, in the style of\n :meth:`DataFrame.select_dtypes`. ``None`` (default) excludes\n nothing.\n\n Returns\n -------\n DataFrame\n Summary statistics of the DataFrame's columns.\n\n See Also\n --------\n Series.describe : Generate descriptive statistics of a Series.\n DataFrame.count : Count of non-NA observations per column.\n DataFrame.max : Maximum of the values in each column.\n DataFrame.min : Minimum of the values in each column.\n DataFrame.mean : Mean of the values.\n DataFrame.std : Standard deviation of the observations.\n DataFrame.select_dtypes : Subset of a DataFrame including/excluding\n columns based on their dtype.\n\n Notes\n -----\n For numeric columns, the result's index includes ``count``,\n ``mean``, ``std``, ``min``, ``max``, and the requested\n percentiles. By default the lower percentile is ``25`` and the\n upper is ``75``; the ``50`` percentile is the same as the median.\n\n For object columns, the result's index includes ``count``,\n ``unique``, ``top``, and ``freq``. The ``top`` is the most common\n value and ``freq`` is its count. If multiple values tie for the\n highest count, ``top`` is chosen arbitrarily from among them.\n\n With ``include='all'``, the result's index is the union of the\n per-dtype indices, with ``NaN`` for statistics that do not apply\n to a given column's dtype.\n\n Examples\n --------\n By default, only numeric columns are analyzed.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"categorical\": pd.Categorical([\"d\", \"e\", \"f\"]),\n ... \"numeric\": [1, 2, 3],\n ... \"object\": [\"a\", \"b\", \"c\"],\n ... }\n ... )\n >>> df.describe()\n numeric\n count 3.0\n mean 2.0\n std 1.0\n min 1.0\n 25% 1.5\n 50% 2.0\n 75% 2.5\n max 3.0\n\n All columns regardless of dtype.\n\n >>> df.describe(include=\"all\") # doctest: +SKIP\n categorical numeric object\n count 3 3.0 3\n unique 3 NaN 3\n top f NaN a\n freq 1 NaN 1\n mean NaN 2.0 NaN\n std NaN 1.0 NaN\n min NaN 1.0 NaN\n 25% NaN 1.5 NaN\n 50% NaN 2.0 NaN\n 75% NaN 2.5 NaN\n max NaN 3.0 NaN\n\n Restrict the result to a specific dtype.\n\n >>> df.describe(include=[\"category\"])\n categorical\n count 3\n unique 3\n top d\n freq 1\n\n Exclude a specific dtype.\n\n >>> df.describe(exclude=[np.number]) # doctest: +SKIP\n categorical object\n count 3 3\n unique 3 3\n top f a\n freq 1 1\n \"\"\"\n return super().describe(\n percentiles=percentiles, include=include, exclude=exclude\n )\n\n def corr(\n self,\n method: CorrelationMethod = \"pearson\",\n min_periods: int = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise correlation of columns, excluding NA/null values.\n\n The result is a symmetric DataFrame where each element represents\n the correlation coefficient between two columns. By default, the\n Pearson correlation is computed, but Kendall and Spearman methods\n as well as arbitrary callables are also supported.\n\n Parameters\n ----------\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float. Note that the returned matrix from corr\n will have 1 along the diagonals and will be symmetric\n regardless of the callable's behavior.\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n Correlation matrix.\n\n See Also\n --------\n DataFrame.corrwith : Compute pairwise correlation with another\n DataFrame or Series.\n Series.corr : Compute the correlation between two Series.\n\n Notes\n -----\n Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.\n\n * `Pearson correlation coefficient `_\n * `Kendall rank correlation coefficient `_\n * `Spearman's rank correlation coefficient `_\n\n Examples\n --------\n >>> def histogram_intersection(a, b):\n ... v = np.minimum(a, b).sum().round(decimals=1)\n ... return v\n >>> df = pd.DataFrame(\n ... [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df.corr(method=histogram_intersection)\n dogs cats\n dogs 1.0 0.3\n cats 0.3 1.0\n\n >>> df = pd.DataFrame(\n ... [(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.corr(min_periods=3)\n dogs cats\n dogs 1.0 NaN\n cats NaN 1.0\n \"\"\" # noqa: E501\n data = self._get_numeric_data() if numeric_only else self\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if method == \"pearson\":\n correl = libalgos.nancorr(mat, minp=min_periods)\n elif method == \"spearman\":\n correl = libalgos.nancorr_spearman(mat, minp=min_periods)\n elif method == \"kendall\" or callable(method):\n if min_periods is None:\n min_periods = 1\n mat = mat.T\n corrf = nanops.get_corr_func(method)\n K = len(cols)\n correl = np.empty((K, K), dtype=float)\n mask = np.isfinite(mat)\n for i, ac in enumerate(mat):\n for j, bc in enumerate(mat):\n if i > j:\n continue\n\n valid = mask[i] & mask[j]\n if valid.sum() < min_periods:\n c = np.nan\n elif i == j:\n c = 1.0\n elif not valid.all():\n c = corrf(ac[valid], bc[valid])\n else:\n c = corrf(ac, bc)\n correl[i, j] = c\n correl[j, i] = c\n else:\n raise ValueError(\n \"method must be either 'pearson', \"\n \"'spearman', 'kendall', or a callable, \"\n f\"'{method}' was supplied\"\n )\n\n result = self._constructor(correl, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"corr\")\n\n def cov(\n self,\n min_periods: int | None = None,\n ddof: int | None = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise covariance of columns, excluding NA/null values.\n\n Compute the pairwise covariance among the series of a DataFrame.\n The returned data frame is the `covariance matrix\n `__ of the columns\n of the DataFrame.\n\n Both NA and null values are automatically excluded from the\n calculation. (See the note below about bias from missing values.)\n A threshold can be set for the minimum number of\n observations for each value created. Comparisons with observations\n below this threshold will be returned as ``NaN``.\n\n This method is generally used for the analysis of time series data to\n understand the relationship between different measures\n across time.\n\n Parameters\n ----------\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result.\n\n ddof : int, default 1\n Delta degrees of freedom. The divisor used in calculations\n is ``N - ddof``, where ``N`` represents the number of elements.\n This argument is applicable only when no ``nan`` is in the dataframe.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n The covariance matrix of the series of the DataFrame.\n\n See Also\n --------\n Series.cov : Compute covariance with another Series.\n core.window.ewm.ExponentialMovingWindow.cov : Exponential weighted sample\n covariance.\n core.window.expanding.Expanding.cov : Expanding sample covariance.\n core.window.rolling.Rolling.cov : Rolling sample covariance.\n\n Notes\n -----\n Returns the covariance matrix of the DataFrame's time series.\n The covariance is normalized by N-ddof.\n\n For DataFrames that have Series that are missing data (assuming that\n data is `missing at random\n `__)\n the returned covariance matrix will be an unbiased estimate\n of the variance and covariance between the member Series.\n\n However, for many applications this estimate may not be acceptable\n because the estimate covariance matrix is not guaranteed to be positive\n semi-definite. This could lead to estimate correlations having\n absolute values which are greater than one, and/or a non-invertible\n covariance matrix. See `Estimation of covariance matrices\n `__ for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(1, 2), (0, 3), (2, 0), (1, 1)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.cov()\n dogs cats\n dogs 0.666667 -1.000000\n cats -1.000000 1.666667\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(\n ... np.random.randn(1000, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"]\n ... )\n >>> df.cov()\n a b c d e\n a 0.998438 -0.020161 0.059277 -0.008943 0.014144\n b -0.020161 1.059352 -0.008543 -0.024738 0.009826\n c 0.059277 -0.008543 1.010670 -0.001486 -0.000271\n d -0.008943 -0.024738 -0.001486 0.921297 -0.013692\n e 0.014144 0.009826 -0.000271 -0.013692 0.977795\n\n **Minimum number of periods**\n\n This method also supports an optional ``min_periods`` keyword\n that specifies the required minimum number of non-NA observations for\n each column pair in order to have a valid result:\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(np.random.randn(20, 3), columns=[\"a\", \"b\", \"c\"])\n >>> df.loc[df.index[:5], \"a\"] = np.nan\n >>> df.loc[df.index[5:10], \"b\"] = np.nan\n >>> df.cov(min_periods=12)\n a b c\n a 0.316741 NaN -0.150812\n b NaN 1.248003 0.191417\n c -0.150812 0.191417 0.895202\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n if any(blk.dtype.kind in \"mM\" for blk in self._mgr.blocks):\n msg = (\n \"DataFrame contains columns with dtype datetime64 \"\n \"or timedelta64, which are not supported for cov.\"\n )\n raise TypeError(msg)\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if notna(mat).all():\n if min_periods is not None and min_periods > len(mat):\n base_cov = np.empty((mat.shape[1], mat.shape[1]))\n base_cov.fill(np.nan)\n else:\n base_cov = np.cov(mat.T, ddof=ddof)\n base_cov = base_cov.reshape((len(cols), len(cols)))\n else:\n base_cov = libalgos.nancorr(mat, cov=True, minp=min_periods)\n\n result = self._constructor(base_cov, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"cov\")\n\n def corrwith(\n self,\n other: DataFrame | Series,\n axis: Axis = 0,\n drop: bool = False,\n method: CorrelationMethod = \"pearson\",\n numeric_only: bool = False,\n min_periods: int | None = None,\n ) -> Series:\n \"\"\"\n Compute pairwise correlation.\n\n Pairwise correlation is computed between rows or columns of\n DataFrame with rows or columns of Series or DataFrame. DataFrames\n are first aligned along both axes before computing the\n correlations.\n\n Parameters\n ----------\n other : DataFrame, Series\n Object with which to compute correlations.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' to compute row-wise, 1 or 'columns' for\n column-wise.\n drop : bool, default False\n Drop missing indices from result.\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n min_periods : int, optional\n Minimum number of observations needed to have a valid result.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n Series\n Pairwise correlations.\n\n See Also\n --------\n DataFrame.corr : Compute pairwise correlation of columns.\n\n Examples\n --------\n >>> index = [\"a\", \"b\", \"c\", \"d\", \"e\"]\n >>> columns = [\"one\", \"two\", \"three\", \"four\"]\n >>> df1 = pd.DataFrame(\n ... np.arange(20).reshape(5, 4), index=index, columns=columns\n ... )\n >>> df2 = pd.DataFrame(\n ... np.arange(16).reshape(4, 4), index=index[:4], columns=columns\n ... )\n >>> df1.corrwith(df2)\n one 1.0\n two 1.0\n three 1.0\n four 1.0\n dtype: float64\n\n >>> df2.corrwith(df1, axis=1)\n a 1.0\n b 1.0\n c 1.0\n d 1.0\n e NaN\n dtype: float64\n \"\"\"\n axis = self._get_axis_number(axis)\n this = self._get_numeric_data() if numeric_only else self\n\n if isinstance(other, Series):\n return this.apply(\n lambda x: other.corr(x, method=method, min_periods=min_periods),\n axis=axis,\n )\n\n if numeric_only:\n other = other._get_numeric_data()\n left, right = this.align(other, join=\"inner\")\n\n if axis == 1:\n left = left.T\n right = right.T\n\n if method == \"pearson\":\n # mask missing values\n left = left + right * 0\n right = right + left * 0\n\n # demeaned data\n ldem = left - left.mean(numeric_only=numeric_only)\n rdem = right - right.mean(numeric_only=numeric_only)\n\n num = (ldem * rdem).sum()\n dom = (\n (left.count() - 1)\n * left.std(numeric_only=numeric_only)\n * right.std(numeric_only=numeric_only)\n )\n\n correl = num / dom\n\n elif method in [\"kendall\", \"spearman\"] or callable(method):\n\n def c(x):\n return nanops.nancorr(x[0], x[1], method=method)\n\n correl = self._constructor_sliced(\n map(c, zip(left.values.T, right.values.T, strict=True)),\n index=left.columns,\n copy=False,\n )\n\n else:\n raise ValueError(\n f\"Invalid method {method} was passed, \"\n \"valid methods are: 'pearson', 'kendall', \"\n \"'spearman', or callable\"\n )\n\n if not drop:\n # Find non-matching labels along the given axis\n # and append missing correlations (GH 22375)\n raxis: AxisInt = 1 if axis == 0 else 0\n result_index = this._get_axis(raxis).union(other._get_axis(raxis))\n idx_diff = result_index.difference(correl.index)\n\n if len(idx_diff) > 0:\n correl = correl._append_internal(\n Series([np.nan] * len(idx_diff), index=idx_diff)\n )\n\n return correl\n\n # ----------------------------------------------------------------------\n # ndarray-like stats methods\n\n def count(self, axis: Axis = 0, numeric_only: bool = False) -> Series:\n \"\"\"\n Count non-NA cells for each column or row.\n\n The values `None`, `NaN`, `NaT`, ``pandas.NA`` are considered NA.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index' counts are generated for each column.\n If 1 or 'columns' counts are generated for each row.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n For each column/row the number of non-NA/null entries.\n\n See Also\n --------\n Series.count: Number of non-NA elements in a Series.\n DataFrame.value_counts: Count unique combinations of columns.\n DataFrame.shape: Number of DataFrame rows and columns (including NA\n elements).\n DataFrame.isna: Boolean same-sized DataFrame showing places of NA\n elements.\n\n Examples\n --------\n Constructing DataFrame from a dictionary:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Person\": [\"John\", \"Myla\", \"Lewis\", \"John\", \"Myla\"],\n ... \"Age\": [24.0, np.nan, 21.0, 33, 26],\n ... \"Single\": [False, True, True, True, False],\n ... }\n ... )\n >>> df\n Person Age Single\n 0 John 24.0 False\n 1 Myla NaN True\n 2 Lewis 21.0 True\n 3 John 33.0 True\n 4 Myla 26.0 False\n\n Notice the uncounted NA values:\n\n >>> df.count()\n Person 5\n Age 4\n Single 5\n dtype: int64\n\n Counts for each **row**:\n\n >>> df.count(axis=\"columns\")\n 0 3\n 1 2\n 2 3\n 3 3\n 4 3\n dtype: int64\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if numeric_only:\n frame = self._get_numeric_data()\n else:\n frame = self\n\n # GH #423\n if len(frame._get_axis(axis)) == 0:\n result = self._constructor_sliced(0, index=frame._get_agg_axis(axis))\n else:\n result = notna(frame).sum(axis=axis)\n\n return result.astype(\"int64\").__finalize__(self, method=\"count\")\n\n def _reduce(\n self,\n op,\n name: str,\n *,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n filter_type=None,\n **kwds,\n ):\n assert filter_type is None or filter_type == \"bool\", filter_type\n out_dtype = \"bool\" if filter_type == \"bool\" else None\n\n if axis is not None:\n axis = self._get_axis_number(axis)\n\n def func(values: np.ndarray):\n # We only use this in the case that operates on self.values\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def blk_func(values, axis: Axis = 1):\n if isinstance(values, ExtensionArray):\n if not is_1d_only_ea_dtype(values.dtype):\n return values._reduce(name, axis=1, skipna=skipna, **kwds)\n return values._reduce(name, skipna=skipna, keepdims=True, **kwds)\n else:\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def _get_data() -> DataFrame:\n if filter_type is None:\n data = self._get_numeric_data()\n else:\n # GH#25101, GH#24434\n assert filter_type == \"bool\"\n data = self._get_bool_data()\n return data\n\n # Case with EAs see GH#35881\n df = self\n if numeric_only:\n df = _get_data()\n if axis is None:\n dtype = find_common_type([block.values.dtype for block in df._mgr.blocks])\n if isinstance(dtype, ExtensionDtype):\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n return arr._reduce(name, skipna=skipna, keepdims=False, **kwds)\n return maybe_unbox_numpy_scalar(func(df.values))\n elif axis == 1:\n if len(df.index) == 0:\n # Taking a transpose would result in no columns, losing the dtype.\n # In the empty case, reducing along axis 0 or 1 gives the same\n # result dtype, so reduce with axis=0 and ignore values\n result = df._reduce(\n op,\n name,\n axis=0,\n skipna=skipna,\n numeric_only=False,\n filter_type=filter_type,\n **kwds,\n ).iloc[:0]\n result.index = df.index\n return result\n\n if df.shape[1]:\n # GH#51474: block-wise axis=1 reduction avoiding expensive\n # transpose for numpy-backed and 2D EA blocks.\n if (\n name in (\"sum\", \"prod\", \"min\", \"max\", \"any\", \"all\", \"mean\")\n and len(df._mgr.blocks) > 1\n and all(\n (isinstance(bv, np.ndarray) and bv.dtype.kind != \"O\")\n or (\n isinstance(bv, ExtensionArray)\n and bv.ndim == 2\n and name in (\"min\", \"max\")\n and skipna\n )\n for bv in (block.values for block in df._mgr.blocks)\n )\n ):\n return df._reduce_axis1(\n name,\n op,\n skipna=skipna,\n min_count=kwds.get(\"min_count\", 0),\n )\n dtype = find_common_type(\n [block.values.dtype for block in df._mgr.blocks]\n )\n if isinstance(dtype, ExtensionDtype):\n # GH 54341: fastpath for EA-backed axis=1 reductions\n # This flattens the frame into a single 1D array while keeping\n # track of the row and column indices of the original frame. Once\n # flattened, grouping by the row indices and aggregating should\n # be equivalent to transposing the original frame and aggregating\n # with axis=0.\n name = {\"argmax\": \"idxmax\", \"argmin\": \"idxmin\"}.get(name, name)\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n nrows, ncols = df.shape\n row_index = np.tile(np.arange(nrows), ncols)\n col_index = np.repeat(np.arange(ncols), nrows)\n ser = Series(arr, index=col_index, copy=False)\n if name == \"all\":\n # Behavior here appears incorrect; preserving\n # for backwards compatibility for now.\n # See https://github.com/pandas-dev/pandas/issues/57171\n skipna = True\n result = ser.groupby(row_index).agg(name, **kwds, skipna=skipna)\n result.index = df.index\n return result\n\n df = df.T\n\n # After possibly _get_data and transposing, we are now in the\n # simple case where we can use BlockManager.reduce\n res = df._mgr.reduce(blk_func)\n out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]\n out.name = None\n if out_dtype is not None and out.dtype != \"boolean\":\n out = out.astype(out_dtype)\n elif (df._mgr.get_dtypes() == object).any() and name not in [\"any\", \"all\"]:\n out = out.astype(object)\n\n return out\n\n def _reduce_axis1(\n self, name: str, func, skipna: bool, min_count: int = 0\n ) -> Series:\n \"\"\"\n Special case for _reduce to try to avoid a potentially-expensive transpose.\n\n Apply the reduction block-wise along axis=1 and then reduce the resulting\n 1D arrays.\n \"\"\"\n if name == \"all\":\n result = np.ones(len(self), dtype=bool)\n ufunc = np.logical_and\n elif name == \"any\":\n result = np.zeros(len(self), dtype=bool)\n # error: Incompatible types in assignment\n # (expression has type \"_UFunc_Nin2_Nout1[Literal['logical_or'],\n # Literal[20], Literal[False]]\", variable has type\n # \"_UFunc_Nin2_Nout1[Literal['logical_and'], Literal[20],\n # Literal[True]]\")\n ufunc = np.logical_or # type: ignore[assignment]\n elif name in (\"sum\", \"mean\"):\n result = None\n ufunc = np.add # type: ignore[assignment]\n elif name == \"prod\":\n result = None\n ufunc = np.multiply # type: ignore[assignment]\n elif name == \"min\":\n result = None\n ufunc = np.fmin if skipna else np.minimum # type: ignore[assignment]\n elif name == \"max\":\n result = None\n ufunc = np.fmax if skipna else np.maximum # type: ignore[assignment]\n else:\n raise NotImplementedError(name)\n\n for block in self._mgr.blocks:\n vals = block.values\n if name in (\"min\", \"max\"):\n middle = ufunc.reduce(vals, axis=0) # type: ignore[arg-type]\n elif name == \"mean\":\n middle = nanops.nansum(vals, axis=0, skipna=skipna, min_count=0) # type: ignore[arg-type]\n elif name in (\"sum\", \"prod\"):\n # min_count=0 here so each block produces a result;\n # the actual min_count threshold is applied across\n # all blocks after the loop.\n middle = func(vals, axis=0, skipna=skipna, min_count=0)\n else:\n middle = func(vals, axis=0, skipna=skipna)\n if result is None:\n result = middle.copy()\n else:\n result = ufunc(result, middle)\n\n # Handle min_count for sum/prod, and compute mean from sum/count\n if name in (\"sum\", \"prod\", \"mean\"):\n if (min_count > 0 or name == \"mean\") and result is not None:\n non_null_count = np.zeros(len(self), dtype=np.intp)\n for block in self._mgr.blocks:\n vals = block.values\n if vals.dtype.kind in \"biu\":\n # bool/int/uint cannot have NaN\n non_null_count += vals.shape[0]\n else:\n non_null_count += vals.shape[0] - isna(vals).sum(axis=0)\n if name == \"mean\":\n null_mask = non_null_count == 0\n result = result.astype(\"float64\")\n result[~null_mask] /= non_null_count[~null_mask]\n result[null_mask] = np.nan\n else:\n null_mask = non_null_count < min_count\n if null_mask.any():\n if result.dtype.kind not in \"fc\":\n result = result.astype(\"float64\")\n result[null_mask] = np.nan\n\n assert result is not None\n res_ser = self._constructor_sliced(result, index=self.index, copy=False)\n return res_ser\n\n # error: Signature of \"any\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def any(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def any(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def any(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n def any(\n self,\n *,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether any element is True, potentially over an axis.\n\n Returns False unless there is at least one element within a series or\n along a Dataframe axis that is True or equivalent (e.g. non-zero or\n non-empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be False, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n numpy.any : Numpy version of this method.\n Series.any : Return whether any element is True.\n Series.all : Return whether all elements are True.\n DataFrame.any : Return whether any element is True over requested axis.\n DataFrame.all : Return whether all elements are True over requested axis.\n\n Examples\n --------\n **Series**\n\n For Series input, the output is a scalar indicating whether any element\n is True.\n\n >>> pd.Series([False, False]).any()\n False\n >>> pd.Series([True, False]).any()\n True\n >>> pd.Series([], dtype=\"float64\").any()\n False\n >>> pd.Series([np.nan]).any()\n False\n >>> pd.Series([np.nan]).any(skipna=False)\n True\n\n **DataFrame**\n\n Whether each column contains at least one True element (the default).\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0, 2], \"C\": [0, 0]})\n >>> df\n A B C\n 0 1 0 0\n 1 2 2 0\n\n >>> df.any()\n A True\n B True\n C False\n dtype: bool\n\n Aggregating over the columns.\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 2]})\n >>> df\n A B\n 0 True 1\n 1 False 2\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 True\n dtype: bool\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 0]})\n >>> df\n A B\n 0 True 1\n 1 False 0\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Aggregating over the entire DataFrame with ``axis=None``.\n\n >>> df.any(axis=None)\n True\n\n `any` for an empty DataFrame is an empty Series.\n\n >>> pd.DataFrame([]).any()\n Series([], dtype: bool)\n \"\"\"\n result = self._logical_func(\n \"any\", nanops.nanany, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"any\")\n return result\n\n @overload\n def all(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def all(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def all(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"all\")\n def all(\n self,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether all elements are True, potentially over an axis.\n\n Returns True unless there at least one element within a series or\n along a Dataframe axis that is False or equivalent (e.g. zero or\n empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be True, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n Series.all : Return True if all elements are True.\n DataFrame.any : Return True if one (or more) elements are True.\n\n Examples\n --------\n **Series**\n\n >>> pd.Series([True, True]).all()\n True\n >>> pd.Series([True, False]).all()\n False\n >>> pd.Series([], dtype=\"float64\").all()\n True\n >>> pd.Series([np.nan]).all()\n True\n >>> pd.Series([np.nan]).all(skipna=False)\n True\n\n **DataFrames**\n\n Create a DataFrame from a dictionary.\n\n >>> df = pd.DataFrame({\"col1\": [True, True], \"col2\": [True, False]})\n >>> df\n col1 col2\n 0 True True\n 1 True False\n\n Default behaviour checks if values in each column all return True.\n\n >>> df.all()\n col1 True\n col2 False\n dtype: bool\n\n Specify ``axis='columns'`` to check if values in each row all return True.\n\n >>> df.all(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Or ``axis=None`` for whether every value is True.\n\n >>> df.all(axis=None)\n False\n \"\"\"\n result = self._logical_func(\n \"all\", nanops.nanall, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"all\")\n return result\n\n # error: Signature of \"min\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def min(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def min(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def min(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"min\")\n def min(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the minimum of the values over the requested axis.\n\n If you want the *index* of the minimum, use ``idxmin``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmin``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.min()\n 0\n \"\"\"\n result = super().min(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"min\")\n return result\n\n # error: Signature of \"max\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def max(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def max(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def max(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"max\")\n def max(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the maximum of the values over the requested axis.\n\n If you want the *index* of the maximum, use ``idxmax``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmax``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.max()\n 8\n \"\"\"\n result = super().max(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"max\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sum\")\n def sum(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the sum of the values over the requested axis.\n\n This is equivalent to the method ``numpy.sum``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sum with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Sum over requested axis.\n\n See Also\n --------\n Series.sum : Return the sum over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.std : Return the standard deviation of the values over the\n requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.sum()\n 14\n\n By default, the sum of an empty or all-NA Series is ``0``.\n\n >>> pd.Series([], dtype=\"float64\").sum() # min_count=0 is the default\n 0.0\n\n This can be controlled with the ``min_count`` parameter. For example, if\n you'd like the sum of an empty series to be NaN, pass ``min_count=1``.\n\n >>> pd.Series([], dtype=\"float64\").sum(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).sum()\n 0.0\n\n >>> pd.Series([np.nan]).sum(min_count=1)\n nan\n \"\"\"\n result = super().sum(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sum\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"prod\")\n def prod(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the product of the values over the requested axis.\n\n This multiplies all values in each column (or row when\n ``axis=1``) together, skipping missing values by default.\n An empty or all-NA column returns ``1`` unless ``min_count``\n is specified.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.prod with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n The product of the values over the requested axis.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n By default, the product of an empty or all-NA Series is ``1``\n\n >>> pd.Series([], dtype=\"float64\").prod()\n 1.0\n\n This can be controlled with the ``min_count`` parameter\n\n >>> pd.Series([], dtype=\"float64\").prod(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).prod()\n 1.0\n\n >>> pd.Series([np.nan]).prod(min_count=1)\n nan\n \"\"\"\n result = super().prod(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"prod\")\n return result\n\n # error: Signature of \"mean\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def mean(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def mean(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def mean(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"mean\")\n def mean(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the mean of the values over the requested axis.\n\n This computes the arithmetic mean of the values in each column\n (or row when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.mean()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.mean()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.mean(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.mean(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().mean(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"mean\")\n return result\n\n # error: Signature of \"median\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def median(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def median(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def median(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\"], name=\"median\"\n )\n def median(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the median of the values over the requested axis.\n\n This computes the median of the values in each column (or row\n when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.median()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.median()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.median(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.median(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().median(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"median\")\n return result\n\n # error: Signature of \"sem\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sem(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def sem(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def sem(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sem\")\n def sem(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased standard error of the mean over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sem with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series\n Unbiased standard error of the mean over requested axis.\n\n See Also\n --------\n DataFrame.var : Return unbiased variance over requested axis.\n DataFrame.std : Returns sample standard deviation over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> round(s.sem(), 6)\n 0.57735\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.sem()\n a 0.5\n b 0.5\n dtype: float64\n\n Using axis=1\n\n >>> df.sem(axis=1)\n tiger 0.5\n zebra 0.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.sem(numeric_only=True)\n a 0.5\n dtype: float64\n \"\"\"\n result = super().sem(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sem\")\n return result\n\n # error: Signature of \"var\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def var(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def var(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def var(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"var\")\n def var(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased variance over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.var with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series or scalaer\n Unbiased variance over requested axis.\n\n See Also\n --------\n numpy.var : Equivalent function in NumPy.\n Series.var : Return unbiased variance over Series values.\n Series.std : Return standard deviation over Series values.\n DataFrame.std : Return standard deviation of the values over\n the requested axis.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n >>> df.var()\n age 352.916667\n height 0.056367\n dtype: float64\n\n Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:\n\n >>> df.var(ddof=0)\n age 264.687500\n height 0.042275\n dtype: float64\n \"\"\"\n result = super().var(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"var\")\n return result\n\n # error: Signature of \"std\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def std(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def std(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def std(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"std\")\n def std(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return sample standard deviation over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.std with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs : dict\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Standard deviation over requested axis.\n\n See Also\n --------\n Series.std : Return standard deviation over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.sum : Return the sum of the values over the requested axis.\n\n Notes\n -----\n To have the same behaviour as ``numpy.std``, use ``ddof=0`` (instead of\n the default ``ddof=1``) and ``skipna=False``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n The standard deviation of the columns can be found as follows:\n\n >>> df.std()\n age 18.786076\n height 0.237417\n dtype: float64\n\n Alternatively, `ddof=0` can be set to normalize by N instead of N-1:\n\n >>> df.std(ddof=0)\n age 16.269219\n height 0.205609\n dtype: float64\n \"\"\"\n result = super().std(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"std\")\n return result\n\n # error: Signature of \"skew\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def skew(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def skew(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def skew(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"skew\")\n def skew(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased skew over requested axis.\n\n Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased skew over requested axis.\n\n See Also\n --------\n DataFrame.kurt : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.skew()\n 0.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [2, 3, 4], \"c\": [1, 3, 5]},\n ... index=[\"tiger\", \"zebra\", \"cow\"],\n ... )\n >>> df\n a b c\n tiger 1 2 1\n zebra 2 3 3\n cow 3 4 5\n >>> df.skew()\n a 0.0\n b 0.0\n c 0.0\n dtype: float64\n\n Using axis=1\n\n >>> df.skew(axis=1)\n tiger 1.732051\n zebra -1.732051\n cow 0.000000\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [\"T\", \"Z\", \"X\"]}, index=[\"tiger\", \"zebra\", \"cow\"]\n ... )\n >>> df.skew(numeric_only=True)\n a 0.0\n dtype: float64\n \"\"\"\n result = super().skew(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"skew\")\n return result\n\n # error: Signature of \"kurt\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def kurt(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"kurt\")\n def kurt(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased kurtosis over requested axis.\n\n Kurtosis obtained using Fisher's definition of\n kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased kurtosis over requested axis.\n\n See Also\n --------\n DataFrame.kurtosis : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 2, 3], index=[\"cat\", \"dog\", \"dog\", \"mouse\"])\n >>> s\n cat 1\n dog\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_colum\n```\n
","tags":["context-window","function-retrieval","python","two-functions","256k"],"expected_answer":["def _construct_result(self, result, other) -> DataFrame:","out = self._constructor(result, copy=False).__finalize__(self)","out.columns = self.columns","out.index = self.index","def _to_dict_of_blocks(self) -> dict[str, DataFrame]:","mgr = self._mgr","return {","k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":256000,"function_names":["_construct_result","_to_dict_of_blocks"],"function_position":"two_functions_20_and_80_percent","evaluation_mode":"two_function_required_terms","expected_full_answer":" def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }"}} +{"id":"function-negative-control-256k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-negative-control-256k\nApproximate target context: 256000 tokens.\nThe source may or may not contain a Python function named `_inferharness_missing_context_probe`. If the function is absent, reply exactly: NOT_FOUND.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n # test_append_empty_frame_to_series_with_dateutil_tz\n row_df = row_df.infer_objects().rename_axis(index.names)\n\n if len(row_df.columns) == len(self.columns):\n # Pre-cast the row's value to the original column dtype where the\n # row's inferred dtype would otherwise force concat to widen the\n # whole column. This avoids an O(N) materialize-and-rebuild\n # roundtrip in _post_expansion_casting, and (for EA dtypes that\n # carry array-level state not encoded in the dtype, e.g. geopandas\n # CRS) preserves that state through concat. GH#65094.\n orig_dtypes = self._mgr.get_dtypes()\n row_dtypes = row_df._mgr.get_dtypes()\n object_dtype = np.dtype(object)\n for i in range(len(self.columns)):\n orig_dtype = orig_dtypes[i]\n if row_dtypes[i] == orig_dtype:\n continue\n if orig_dtype == object_dtype:\n # concat object + anything stays object; post-cast is a\n # no-op, so pre-casting would only add overhead.\n continue\n arr = self._get_column_array(i)\n if isinstance(arr, np.ndarray):\n # infer_and_maybe_downcast expects an EA as its first\n # argument so it can dispatch to _cast_pointwise_result.\n arr = NumpyExtensionArray(arr)\n casted = infer_and_maybe_downcast(arr, row_df._mgr.iget_values(i))\n row_df.isetitem(i, casted)\n\n from pandas.core.reshape.concat import concat\n\n result = concat(\n [self, row_df],\n ignore_index=ignore_index,\n )\n return result.__finalize__(self, method=\"append\")\n\n def join(\n self,\n other: DataFrame | Series | Iterable[DataFrame | Series],\n on: IndexLabel | None = None,\n how: MergeHow = \"left\",\n lsuffix: str = \"\",\n rsuffix: str = \"\",\n sort: bool = False,\n validate: JoinValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Join columns of another DataFrame.\n\n Join columns with `other` DataFrame either on index or on a key\n column. Efficiently join multiple DataFrame objects by index at once by\n passing a list.\n\n Parameters\n ----------\n other : DataFrame, Series, or a list containing any combination of them\n Index should be similar to one of the columns in the caller. If a\n Series is passed, its name attribute must be set, and that will be\n used as the column name in the resulting joined DataFrame.\n on : str, list of str, or array-like, optional\n Column or index level name(s) in the caller to join on the index\n in `other`, otherwise joins index-on-index. If multiple\n values given, the `other` DataFrame must have a MultiIndex. Can\n pass an array as the join key if it is not already contained in\n the calling DataFrame. Like an Excel VLOOKUP operation.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'left'\n How to handle the operation of the two objects.\n\n * left: use calling frame's index (or column if on is specified)\n * right: use `other`'s index.\n * outer: form union of calling frame's index (or column if on is\n specified) with `other`'s index, and sort it lexicographically.\n * inner: form intersection of calling frame's index (or column if\n on is specified) with `other`'s index, preserving the order\n of the calling's one.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use set difference of calling frame's index and `other`'s\n index.\n * right_anti: use set difference of `other`'s index and calling frame's\n index.\n lsuffix : str, default ''\n Suffix to use from left frame's overlapping columns.\n rsuffix : str, default ''\n Suffix to use from right frame's overlapping columns.\n sort : bool, default False\n Order result DataFrame lexicographically by the join key. If False,\n the order of the join key depends on the join type (how keyword).\n validate : str, optional\n If specified, checks if join is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if join keys are unique in both left\n and right datasets.\n * \"one_to_many\" or \"1:m\": check if join keys are unique in left dataset.\n * \"many_to_one\" or \"m:1\": check if join keys are unique in right dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A dataframe containing columns from both the caller and `other`.\n\n See Also\n --------\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n Parameters `on`, `lsuffix`, and `rsuffix` are not supported when\n passing a list of `DataFrame` objects.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K2\", \"K3\", \"K4\", \"K5\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K2 A2\n 3 K3 A3\n 4 K4 A4\n 5 K5 A5\n\n >>> other = pd.DataFrame({\"key\": [\"K0\", \"K1\", \"K2\"], \"B\": [\"B0\", \"B1\", \"B2\"]})\n\n >>> other\n key B\n 0 K0 B0\n 1 K1 B1\n 2 K2 B2\n\n Join DataFrames using their indexes.\n\n >>> df.join(other, lsuffix=\"_caller\", rsuffix=\"_other\")\n key_caller A key_other B\n 0 K0 A0 K0 B0\n 1 K1 A1 K1 B1\n 2 K2 A2 K2 B2\n 3 K3 A3 NaN NaN\n 4 K4 A4 NaN NaN\n 5 K5 A5 NaN NaN\n\n If we want to join using the key columns, we need to set key to be\n the index in both `df` and `other`. The joined DataFrame will have\n key as its index.\n\n >>> df.set_index(\"key\").join(other.set_index(\"key\"))\n A B\n key\n K0 A0 B0\n K1 A1 B1\n K2 A2 B2\n K3 A3 NaN\n K4 A4 NaN\n K5 A5 NaN\n\n Another option to join using the key columns is to use the `on`\n parameter. DataFrame.join always uses `other`'s index but we can use\n any column in `df`. This method preserves the original DataFrame's\n index in the result.\n\n >>> df.join(other.set_index(\"key\"), on=\"key\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K2 A2 B2\n 3 K3 A3 NaN\n 4 K4 A4 NaN\n 5 K5 A5 NaN\n\n Using non-unique key values shows how they are matched.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K1\", \"K3\", \"K0\", \"K1\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K1 A2\n 3 K3 A3\n 4 K0 A4\n 5 K1 A5\n\n >>> df.join(other.set_index(\"key\"), on=\"key\", validate=\"m:1\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K1 A2 B1\n 3 K3 A3 NaN\n 4 K0 A4 B0\n 5 K1 A5 B1\n \"\"\"\n from pandas.core.reshape.concat import concat\n from pandas.core.reshape.merge import merge\n\n if isinstance(other, Series):\n if other.name is None:\n raise ValueError(\"Other Series must have a name\")\n other = DataFrame({other.name: other})\n\n if isinstance(other, DataFrame):\n if how == \"cross\":\n return merge(\n self,\n other,\n how=how,\n on=on,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n return merge(\n self,\n other,\n left_on=on,\n how=how,\n left_index=on is None,\n right_index=True,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n else:\n if on is not None:\n raise ValueError(\n \"Joining multiple DataFrames only supported for joining on index\"\n )\n\n if rsuffix or lsuffix:\n raise ValueError(\n \"Suffixes not supported when joining multiple DataFrames\"\n )\n\n # Mypy thinks the RHS is a\n # \"Union[DataFrame, Series, Iterable[Union[DataFrame, Series]]]\" whereas\n # the LHS is an \"Iterable[DataFrame]\", but in reality both types are\n # \"Iterable[Union[DataFrame, Series]]\" due to the if statements\n frames = [cast(\"DataFrame | Series\", self), *list(other)]\n\n can_concat = all(df.index.is_unique for df in frames)\n\n # join indexes only using concat\n if can_concat:\n if how in {\"left\", \"right\"}:\n res = concat(\n frames, axis=1, join=\"outer\", verify_integrity=True, sort=sort\n )\n index = self.index if how == \"left\" else frames[-1].index\n if sort:\n index = index.sort_values()\n result = res.reindex(index)\n return result\n else:\n if how == \"outer\":\n sort = True\n return concat(\n frames, axis=1, join=how, verify_integrity=True, sort=sort\n )\n\n joined = frames[0]\n\n for frame in frames[1:]:\n joined = merge(\n joined,\n frame,\n sort=sort,\n how=how,\n left_index=True,\n right_index=True,\n validate=validate,\n )\n\n return joined\n\n def merge(\n self,\n right: DataFrame | Series,\n how: MergeHow = \"inner\",\n on: IndexLabel | AnyArrayLike | None = None,\n left_on: IndexLabel | AnyArrayLike | None = None,\n right_on: IndexLabel | AnyArrayLike | None = None,\n left_index: bool = False,\n right_index: bool = False,\n sort: bool = False,\n suffixes: Suffixes = (\"_x\", \"_y\"),\n copy: bool | lib.NoDefault = lib.no_default,\n indicator: str | bool = False,\n validate: MergeValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Merge DataFrame or named Series objects with a database-style join.\n\n A named Series object is treated as a DataFrame with a single named column.\n\n The join is done on columns or indexes. If joining columns on\n columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes\n on indexes or indexes on a column or columns, the index will be passed on.\n When performing a cross merge, no column specifications to merge on are\n allowed.\n\n .. warning::\n\n If both key columns contain rows where the key is a null value, those\n rows will be matched against each other. This is different from usual SQL\n join behaviour and can lead to unexpected results.\n\n Parameters\n ----------\n right : DataFrame or named Series\n Object to merge with.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'inner'\n Type of merge to be performed.\n\n * left: use only keys from left frame, similar to a SQL left outer join;\n preserve key order.\n * right: use only keys from right frame, similar to a SQL right outer join;\n preserve key order.\n * outer: use union of keys from both frames, similar to a SQL full outer\n join; sort keys lexicographically.\n * inner: use intersection of keys from both frames, similar to a SQL inner\n join; preserve the order of the left keys.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use only keys from left frame that are not in right frame,\n similar to SQL left anti join; preserve key order.\n\n .. versionadded:: 3.0\n * right_anti: use only keys from right frame that are not in left frame,\n similar to SQL right anti join; preserve key order.\n\n .. versionadded:: 3.0\n on : Hashable or a sequence of the previous\n Column or index level names to join on. These must be found in both\n DataFrames. If `on` is None and not merging on indexes then this defaults\n to the intersection of the columns in both DataFrames.\n left_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the left DataFrame. Can also\n be an array or list of arrays of the length of the left DataFrame.\n These arrays are treated as if they are columns.\n right_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the right DataFrame. Can also\n be an array or list of arrays of the length of the right DataFrame.\n These arrays are treated as if they are columns.\n left_index : bool, default False\n Use the index from the left DataFrame as the join key(s). If it is a\n MultiIndex, the number of keys in the other DataFrame (either the index\n or a number of columns) must match the number of levels.\n right_index : bool, default False\n Use the index from the right DataFrame as the join key. Same caveats as\n left_index.\n sort : bool, default False\n Sort the join keys lexicographically in the result DataFrame. If False,\n the order of the join keys depends on the join type (how keyword).\n suffixes : list-like, default is (\"_x\", \"_y\")\n A length-2 sequence where each element is optionally a string\n indicating the suffix to add to overlapping column names in\n `left` and `right` respectively. Pass a value of `None` instead\n of a string to indicate that the column name from `left` or\n `right` should be left as-is, with no suffix. At least one of the\n values must not be None.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n indicator : bool or str, default False\n If True, adds a column to the output DataFrame called \"_merge\" with\n information on the source of each row. The column can be given a different\n name by providing a string argument. The column will have a Categorical\n type with the value of \"left_only\" for observations whose merge key only\n appears in the left DataFrame, \"right_only\" for observations\n whose merge key only appears in the right DataFrame, and \"both\"\n if the observation's merge key is found in both DataFrames.\n\n validate : str, optional\n If specified, checks if merge is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if merge keys are unique in both\n left and right datasets.\n * \"one_to_many\" or \"1:m\": check if merge keys are unique in left\n dataset.\n * \"many_to_one\" or \"m:1\": check if merge keys are unique in right\n dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A DataFrame of the two merged objects.\n\n See Also\n --------\n merge_ordered : Merge with optional filling/interpolation.\n merge_asof : Merge on nearest keys.\n DataFrame.join : Similar method using indices.\n\n Examples\n --------\n >>> df1 = pd.DataFrame(\n ... {\"lkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [1, 2, 3, 5]}\n ... )\n >>> df2 = pd.DataFrame(\n ... {\"rkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [5, 6, 7, 8]}\n ... )\n >>> df1\n lkey value\n 0 foo 1\n 1 bar 2\n 2 baz 3\n 3 foo 5\n >>> df2\n rkey value\n 0 foo 5\n 1 bar 6\n 2 baz 7\n 3 foo 8\n\n Merge df1 and df2 on the lkey and rkey columns. The value columns have\n the default suffixes, _x and _y, appended.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\")\n lkey value_x rkey value_y\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2 with specified left and right suffixes\n appended to any overlapping columns.\n\n >>> df1.merge(\n ... df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(\"_left\", \"_right\")\n ... )\n lkey value_left rkey value_right\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2, but raise an exception if the DataFrames have\n any overlapping columns.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(False, False))\n Traceback (most recent call last):\n ...\n ValueError: columns overlap but no suffix specified:\n Index(['value'], dtype='object')\n\n >>> df1 = pd.DataFrame({\"a\": [\"foo\", \"bar\"], \"b\": [1, 2]})\n >>> df2 = pd.DataFrame({\"a\": [\"foo\", \"baz\"], \"c\": [3, 4]})\n >>> df1\n a b\n 0 foo 1\n 1 bar 2\n >>> df2\n a c\n 0 foo 3\n 1 baz 4\n\n >>> df1.merge(df2, how=\"inner\", on=\"a\")\n a b c\n 0 foo 1 3\n\n >>> df1.merge(df2, how=\"left\", on=\"a\")\n a b c\n 0 foo 1 3.0\n 1 bar 2 NaN\n\n >>> df1 = pd.DataFrame({\"left\": [\"foo\", \"bar\"]})\n >>> df2 = pd.DataFrame({\"right\": [7, 8]})\n >>> df1\n left\n 0 foo\n 1 bar\n >>> df2\n right\n 0 7\n 1 8\n\n >>> df1.merge(df2, how=\"cross\")\n left right\n 0 foo 7\n 1 foo 8\n 2 bar 7\n 3 bar 8\n \"\"\"\n self._check_copy_deprecation(copy)\n\n from pandas.core.reshape.merge import merge\n\n return merge(\n self,\n right,\n how=how,\n on=on,\n left_on=left_on,\n right_on=right_on,\n left_index=left_index,\n right_index=right_index,\n sort=sort,\n suffixes=suffixes,\n indicator=indicator,\n validate=validate,\n )\n\n def round(\n self, decimals: int | dict[IndexLabel, int] | Series = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Round numeric columns in a DataFrame to a variable number of decimal places.\n\n Each column can be rounded to a different number of decimal places by\n passing a dict or Series mapping column names to the desired precision.\n Non-numeric columns are left unchanged.\n\n Parameters\n ----------\n decimals : int, dict, Series\n Number of decimal places to round each column to. If an int is\n given, round each column to the same number of places.\n Otherwise dict and Series round to variable numbers of places.\n Column names should be in the keys if `decimals` is a\n dict-like, or in the index if `decimals` is a Series. Any\n columns not included in `decimals` will be left as is. Elements\n of `decimals` which are not columns of the input will be\n ignored.\n *args\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n\n Returns\n -------\n DataFrame\n A DataFrame with the affected columns rounded to the specified\n number of decimal places.\n\n See Also\n --------\n numpy.around : Round a numpy array to the given number of decimals.\n Series.round : Round a Series to the given number of decimals.\n\n Notes\n -----\n For values exactly halfway between rounded decimal values, pandas rounds\n to the nearest even value (e.g. -0.5 and 0.5 round to 0.0, 1.5 and 2.5\n round to 2.0, etc.).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(0.21, 0.32), (0.01, 0.67), (0.66, 0.03), (0.21, 0.18)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df\n dogs cats\n 0 0.21 0.32\n 1 0.01 0.67\n 2 0.66 0.03\n 3 0.21 0.18\n\n By providing an integer each column is rounded to the same number\n of decimal places\n\n >>> df.round(1)\n dogs cats\n 0 0.2 0.3\n 1 0.0 0.7\n 2 0.7 0.0\n 3 0.2 0.2\n\n With a dict, the number of places for specific columns can be\n specified with the column names as key and the number of decimal\n places as value\n\n >>> df.round({\"dogs\": 1, \"cats\": 0})\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n\n Using a Series, the number of places for specific columns can be\n specified with the column names as index and the number of\n decimal places as value\n\n >>> decimals = pd.Series([0, 1], index=[\"cats\", \"dogs\"])\n >>> df.round(decimals)\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n \"\"\"\n from pandas.core.reshape.concat import concat\n\n def _dict_round(df: DataFrame, decimals) -> Iterator[Series]:\n for col, vals in df.items():\n try:\n yield _series_round(vals, decimals[col])\n except KeyError:\n yield vals\n\n def _series_round(ser: Series, decimals: int) -> Series:\n if is_integer_dtype(ser.dtype) or is_float_dtype(ser.dtype):\n return ser.round(decimals)\n elif isinstance(ser._values, (DatetimeArray, TimedeltaArray, PeriodArray)):\n # GH#57781\n # TODO: also the ArrowDtype analogues?\n warnings.warn(\n \"obj.round has no effect with datetime, timedelta, \"\n \"or period dtypes. Use obj.dt.round(...) instead.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n return ser\n\n nv.validate_round(args, kwargs)\n\n if isinstance(decimals, (dict, Series)):\n if isinstance(decimals, Series) and not decimals.index.is_unique:\n raise ValueError(\"Index of decimals must be unique\")\n if is_dict_like(decimals) and not all(\n is_integer(value) for _, value in decimals.items()\n ):\n raise TypeError(\"Values in decimals must be integers\")\n new_cols = list(_dict_round(self, decimals))\n elif is_integer(decimals):\n # Dispatch to Block.round\n # Argument \"decimals\" to \"round\" of \"BaseBlockManager\" has incompatible\n # type \"Union[int, integer[Any]]\"; expected \"int\"\n new_mgr = self._mgr.round(\n decimals=decimals, # type: ignore[arg-type]\n )\n return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(\n self, method=\"round\"\n )\n else:\n raise TypeError(\"decimals must be an integer, a dict-like or a Series\")\n\n if new_cols is not None and len(new_cols) > 0:\n return self._constructor(\n concat(new_cols, axis=1), index=self.index, columns=self.columns\n ).__finalize__(self, method=\"round\")\n else:\n return self.copy(deep=False)\n\n # ----------------------------------------------------------------------\n # Statistical methods, etc.\n\n def describe(\n self,\n percentiles=None,\n include=None,\n exclude=None,\n ) -> DataFrame:\n \"\"\"\n Generate descriptive statistics.\n\n Summarize the central tendency, dispersion, and shape of each\n analyzed column's distribution, excluding ``NaN`` values. By\n default only numeric columns are analyzed; pass ``include`` to\n also analyze non-numeric columns (or ``exclude`` to omit columns\n by dtype).\n\n Parameters\n ----------\n percentiles : list-like of numbers, optional\n The percentiles to include in the output. All should fall\n between 0 and 1. The default, ``None``, returns the 25th,\n 50th, and 75th percentiles.\n include : 'all', list-like of dtypes or None (default), optional\n Which column dtypes to include. Options:\n\n - ``'all'`` : Include all columns, including non-numeric ones.\n - list-like of dtypes : Limit the result to columns of the\n given dtypes, in the style of\n :meth:`DataFrame.select_dtypes` (e.g. ``include=[np.number]``\n or ``include=[\"category\"]``).\n - ``None`` (default) : Include only numeric columns, falling\n back to object and categorical columns if there are no\n numeric columns.\n exclude : list-like of dtypes or None (default), optional\n Column dtypes to omit from the result, in the style of\n :meth:`DataFrame.select_dtypes`. ``None`` (default) excludes\n nothing.\n\n Returns\n -------\n DataFrame\n Summary statistics of the DataFrame's columns.\n\n See Also\n --------\n Series.describe : Generate descriptive statistics of a Series.\n DataFrame.count : Count of non-NA observations per column.\n DataFrame.max : Maximum of the values in each column.\n DataFrame.min : Minimum of the values in each column.\n DataFrame.mean : Mean of the values.\n DataFrame.std : Standard deviation of the observations.\n DataFrame.select_dtypes : Subset of a DataFrame including/excluding\n columns based on their dtype.\n\n Notes\n -----\n For numeric columns, the result's index includes ``count``,\n ``mean``, ``std``, ``min``, ``max``, and the requested\n percentiles. By default the lower percentile is ``25`` and the\n upper is ``75``; the ``50`` percentile is the same as the median.\n\n For object columns, the result's index includes ``count``,\n ``unique``, ``top``, and ``freq``. The ``top`` is the most common\n value and ``freq`` is its count. If multiple values tie for the\n highest count, ``top`` is chosen arbitrarily from among them.\n\n With ``include='all'``, the result's index is the union of the\n per-dtype indices, with ``NaN`` for statistics that do not apply\n to a given column's dtype.\n\n Examples\n --------\n By default, only numeric columns are analyzed.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"categorical\": pd.Categorical([\"d\", \"e\", \"f\"]),\n ... \"numeric\": [1, 2, 3],\n ... \"object\": [\"a\", \"b\", \"c\"],\n ... }\n ... )\n >>> df.describe()\n numeric\n count 3.0\n mean 2.0\n std 1.0\n min 1.0\n 25% 1.5\n 50% 2.0\n 75% 2.5\n max 3.0\n\n All columns regardless of dtype.\n\n >>> df.describe(include=\"all\") # doctest: +SKIP\n categorical numeric object\n count 3 3.0 3\n unique 3 NaN 3\n top f NaN a\n freq 1 NaN 1\n mean NaN 2.0 NaN\n std NaN 1.0 NaN\n min NaN 1.0 NaN\n 25% NaN 1.5 NaN\n 50% NaN 2.0 NaN\n 75% NaN 2.5 NaN\n max NaN 3.0 NaN\n\n Restrict the result to a specific dtype.\n\n >>> df.describe(include=[\"category\"])\n categorical\n count 3\n unique 3\n top d\n freq 1\n\n Exclude a specific dtype.\n\n >>> df.describe(exclude=[np.number]) # doctest: +SKIP\n categorical object\n count 3 3\n unique 3 3\n top f a\n freq 1 1\n \"\"\"\n return super().describe(\n percentiles=percentiles, include=include, exclude=exclude\n )\n\n def corr(\n self,\n method: CorrelationMethod = \"pearson\",\n min_periods: int = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise correlation of columns, excluding NA/null values.\n\n The result is a symmetric DataFrame where each element represents\n the correlation coefficient between two columns. By default, the\n Pearson correlation is computed, but Kendall and Spearman methods\n as well as arbitrary callables are also supported.\n\n Parameters\n ----------\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float. Note that the returned matrix from corr\n will have 1 along the diagonals and will be symmetric\n regardless of the callable's behavior.\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n Correlation matrix.\n\n See Also\n --------\n DataFrame.corrwith : Compute pairwise correlation with another\n DataFrame or Series.\n Series.corr : Compute the correlation between two Series.\n\n Notes\n -----\n Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.\n\n * `Pearson correlation coefficient `_\n * `Kendall rank correlation coefficient `_\n * `Spearman's rank correlation coefficient `_\n\n Examples\n --------\n >>> def histogram_intersection(a, b):\n ... v = np.minimum(a, b).sum().round(decimals=1)\n ... return v\n >>> df = pd.DataFrame(\n ... [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df.corr(method=histogram_intersection)\n dogs cats\n dogs 1.0 0.3\n cats 0.3 1.0\n\n >>> df = pd.DataFrame(\n ... [(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.corr(min_periods=3)\n dogs cats\n dogs 1.0 NaN\n cats NaN 1.0\n \"\"\" # noqa: E501\n data = self._get_numeric_data() if numeric_only else self\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if method == \"pearson\":\n correl = libalgos.nancorr(mat, minp=min_periods)\n elif method == \"spearman\":\n correl = libalgos.nancorr_spearman(mat, minp=min_periods)\n elif method == \"kendall\" or callable(method):\n if min_periods is None:\n min_periods = 1\n mat = mat.T\n corrf = nanops.get_corr_func(method)\n K = len(cols)\n correl = np.empty((K, K), dtype=float)\n mask = np.isfinite(mat)\n for i, ac in enumerate(mat):\n for j, bc in enumerate(mat):\n if i > j:\n continue\n\n valid = mask[i] & mask[j]\n if valid.sum() < min_periods:\n c = np.nan\n elif i == j:\n c = 1.0\n elif not valid.all():\n c = corrf(ac[valid], bc[valid])\n else:\n c = corrf(ac, bc)\n correl[i, j] = c\n correl[j, i] = c\n else:\n raise ValueError(\n \"method must be either 'pearson', \"\n \"'spearman', 'kendall', or a callable, \"\n f\"'{method}' was supplied\"\n )\n\n result = self._constructor(correl, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"corr\")\n\n def cov(\n self,\n min_periods: int | None = None,\n ddof: int | None = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise covariance of columns, excluding NA/null values.\n\n Compute the pairwise covariance among the series of a DataFrame.\n The returned data frame is the `covariance matrix\n `__ of the columns\n of the DataFrame.\n\n Both NA and null values are automatically excluded from the\n calculation. (See the note below about bias from missing values.)\n A threshold can be set for the minimum number of\n observations for each value created. Comparisons with observations\n below this threshold will be returned as ``NaN``.\n\n This method is generally used for the analysis of time series data to\n understand the relationship between different measures\n across time.\n\n Parameters\n ----------\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result.\n\n ddof : int, default 1\n Delta degrees of freedom. The divisor used in calculations\n is ``N - ddof``, where ``N`` represents the number of elements.\n This argument is applicable only when no ``nan`` is in the dataframe.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n The covariance matrix of the series of the DataFrame.\n\n See Also\n --------\n Series.cov : Compute covariance with another Series.\n core.window.ewm.ExponentialMovingWindow.cov : Exponential weighted sample\n covariance.\n core.window.expanding.Expanding.cov : Expanding sample covariance.\n core.window.rolling.Rolling.cov : Rolling sample covariance.\n\n Notes\n -----\n Returns the covariance matrix of the DataFrame's time series.\n The covariance is normalized by N-ddof.\n\n For DataFrames that have Series that are missing data (assuming that\n data is `missing at random\n `__)\n the returned covariance matrix will be an unbiased estimate\n of the variance and covariance between the member Series.\n\n However, for many applications this estimate may not be acceptable\n because the estimate covariance matrix is not guaranteed to be positive\n semi-definite. This could lead to estimate correlations having\n absolute values which are greater than one, and/or a non-invertible\n covariance matrix. See `Estimation of covariance matrices\n `__ for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(1, 2), (0, 3), (2, 0), (1, 1)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.cov()\n dogs cats\n dogs 0.666667 -1.000000\n cats -1.000000 1.666667\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(\n ... np.random.randn(1000, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"]\n ... )\n >>> df.cov()\n a b c d e\n a 0.998438 -0.020161 0.059277 -0.008943 0.014144\n b -0.020161 1.059352 -0.008543 -0.024738 0.009826\n c 0.059277 -0.008543 1.010670 -0.001486 -0.000271\n d -0.008943 -0.024738 -0.001486 0.921297 -0.013692\n e 0.014144 0.009826 -0.000271 -0.013692 0.977795\n\n **Minimum number of periods**\n\n This method also supports an optional ``min_periods`` keyword\n that specifies the required minimum number of non-NA observations for\n each column pair in order to have a valid result:\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(np.random.randn(20, 3), columns=[\"a\", \"b\", \"c\"])\n >>> df.loc[df.index[:5], \"a\"] = np.nan\n >>> df.loc[df.index[5:10], \"b\"] = np.nan\n >>> df.cov(min_periods=12)\n a b c\n a 0.316741 NaN -0.150812\n b NaN 1.248003 0.191417\n c -0.150812 0.191417 0.895202\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n if any(blk.dtype.kind in \"mM\" for blk in self._mgr.blocks):\n msg = (\n \"DataFrame contains columns with dtype datetime64 \"\n \"or timedelta64, which are not supported for cov.\"\n )\n raise TypeError(msg)\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if notna(mat).all():\n if min_periods is not None and min_periods > len(mat):\n base_cov = np.empty((mat.shape[1], mat.shape[1]))\n base_cov.fill(np.nan)\n else:\n base_cov = np.cov(mat.T, ddof=ddof)\n base_cov = base_cov.reshape((len(cols), len(cols)))\n else:\n base_cov = libalgos.nancorr(mat, cov=True, minp=min_periods)\n\n result = self._constructor(base_cov, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"cov\")\n\n def corrwith(\n self,\n other: DataFrame | Series,\n axis: Axis = 0,\n drop: bool = False,\n method: CorrelationMethod = \"pearson\",\n numeric_only: bool = False,\n min_periods: int | None = None,\n ) -> Series:\n \"\"\"\n Compute pairwise correlation.\n\n Pairwise correlation is computed between rows or columns of\n DataFrame with rows or columns of Series or DataFrame. DataFrames\n are first aligned along both axes before computing the\n correlations.\n\n Parameters\n ----------\n other : DataFrame, Series\n Object with which to compute correlations.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' to compute row-wise, 1 or 'columns' for\n column-wise.\n drop : bool, default False\n Drop missing indices from result.\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n min_periods : int, optional\n Minimum number of observations needed to have a valid result.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n Series\n Pairwise correlations.\n\n See Also\n --------\n DataFrame.corr : Compute pairwise correlation of columns.\n\n Examples\n --------\n >>> index = [\"a\", \"b\", \"c\", \"d\", \"e\"]\n >>> columns = [\"one\", \"two\", \"three\", \"four\"]\n >>> df1 = pd.DataFrame(\n ... np.arange(20).reshape(5, 4), index=index, columns=columns\n ... )\n >>> df2 = pd.DataFrame(\n ... np.arange(16).reshape(4, 4), index=index[:4], columns=columns\n ... )\n >>> df1.corrwith(df2)\n one 1.0\n two 1.0\n three 1.0\n four 1.0\n dtype: float64\n\n >>> df2.corrwith(df1, axis=1)\n a 1.0\n b 1.0\n c 1.0\n d 1.0\n e NaN\n dtype: float64\n \"\"\"\n axis = self._get_axis_number(axis)\n this = self._get_numeric_data() if numeric_only else self\n\n if isinstance(other, Series):\n return this.apply(\n lambda x: other.corr(x, method=method, min_periods=min_periods),\n axis=axis,\n )\n\n if numeric_only:\n other = other._get_numeric_data()\n left, right = this.align(other, join=\"inner\")\n\n if axis == 1:\n left = left.T\n right = right.T\n\n if method == \"pearson\":\n # mask missing values\n left = left + right * 0\n right = right + left * 0\n\n # demeaned data\n ldem = left - left.mean(numeric_only=numeric_only)\n rdem = right - right.mean(numeric_only=numeric_only)\n\n num = (ldem * rdem).sum()\n dom = (\n (left.count() - 1)\n * left.std(numeric_only=numeric_only)\n * right.std(numeric_only=numeric_only)\n )\n\n correl = num / dom\n\n elif method in [\"kendall\", \"spearman\"] or callable(method):\n\n def c(x):\n return nanops.nancorr(x[0], x[1], method=method)\n\n correl = self._constructor_sliced(\n map(c, zip(left.values.T, right.values.T, strict=True)),\n index=left.columns,\n copy=False,\n )\n\n else:\n raise ValueError(\n f\"Invalid method {method} was passed, \"\n \"valid methods are: 'pearson', 'kendall', \"\n \"'spearman', or callable\"\n )\n\n if not drop:\n # Find non-matching labels along the given axis\n # and append missing correlations (GH 22375)\n raxis: AxisInt = 1 if axis == 0 else 0\n result_index = this._get_axis(raxis).union(other._get_axis(raxis))\n idx_diff = result_index.difference(correl.index)\n\n if len(idx_diff) > 0:\n correl = correl._append_internal(\n Series([np.nan] * len(idx_diff), index=idx_diff)\n )\n\n return correl\n\n # ----------------------------------------------------------------------\n # ndarray-like stats methods\n\n def count(self, axis: Axis = 0, numeric_only: bool = False) -> Series:\n \"\"\"\n Count non-NA cells for each column or row.\n\n The values `None`, `NaN`, `NaT`, ``pandas.NA`` are considered NA.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index' counts are generated for each column.\n If 1 or 'columns' counts are generated for each row.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n For each column/row the number of non-NA/null entries.\n\n See Also\n --------\n Series.count: Number of non-NA elements in a Series.\n DataFrame.value_counts: Count unique combinations of columns.\n DataFrame.shape: Number of DataFrame rows and columns (including NA\n elements).\n DataFrame.isna: Boolean same-sized DataFrame showing places of NA\n elements.\n\n Examples\n --------\n Constructing DataFrame from a dictionary:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Person\": [\"John\", \"Myla\", \"Lewis\", \"John\", \"Myla\"],\n ... \"Age\": [24.0, np.nan, 21.0, 33, 26],\n ... \"Single\": [False, True, True, True, False],\n ... }\n ... )\n >>> df\n Person Age Single\n 0 John 24.0 False\n 1 Myla NaN True\n 2 Lewis 21.0 True\n 3 John 33.0 True\n 4 Myla 26.0 False\n\n Notice the uncounted NA values:\n\n >>> df.count()\n Person 5\n Age 4\n Single 5\n dtype: int64\n\n Counts for each **row**:\n\n >>> df.count(axis=\"columns\")\n 0 3\n 1 2\n 2 3\n 3 3\n 4 3\n dtype: int64\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if numeric_only:\n frame = self._get_numeric_data()\n else:\n frame = self\n\n # GH #423\n if len(frame._get_axis(axis)) == 0:\n result = self._constructor_sliced(0, index=frame._get_agg_axis(axis))\n else:\n result = notna(frame).sum(axis=axis)\n\n return result.astype(\"int64\").__finalize__(self, method=\"count\")\n\n def _reduce(\n self,\n op,\n name: str,\n *,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n filter_type=None,\n **kwds,\n ):\n assert filter_type is None or filter_type == \"bool\", filter_type\n out_dtype = \"bool\" if filter_type == \"bool\" else None\n\n if axis is not None:\n axis = self._get_axis_number(axis)\n\n def func(values: np.ndarray):\n # We only use this in the case that operates on self.values\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def blk_func(values, axis: Axis = 1):\n if isinstance(values, ExtensionArray):\n if not is_1d_only_ea_dtype(values.dtype):\n return values._reduce(name, axis=1, skipna=skipna, **kwds)\n return values._reduce(name, skipna=skipna, keepdims=True, **kwds)\n else:\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def _get_data() -> DataFrame:\n if filter_type is None:\n data = self._get_numeric_data()\n else:\n # GH#25101, GH#24434\n assert filter_type == \"bool\"\n data = self._get_bool_data()\n return data\n\n # Case with EAs see GH#35881\n df = self\n if numeric_only:\n df = _get_data()\n if axis is None:\n dtype = find_common_type([block.values.dtype for block in df._mgr.blocks])\n if isinstance(dtype, ExtensionDtype):\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n return arr._reduce(name, skipna=skipna, keepdims=False, **kwds)\n return maybe_unbox_numpy_scalar(func(df.values))\n elif axis == 1:\n if len(df.index) == 0:\n # Taking a transpose would result in no columns, losing the dtype.\n # In the empty case, reducing along axis 0 or 1 gives the same\n # result dtype, so reduce with axis=0 and ignore values\n result = df._reduce(\n op,\n name,\n axis=0,\n skipna=skipna,\n numeric_only=False,\n filter_type=filter_type,\n **kwds,\n ).iloc[:0]\n result.index = df.index\n return result\n\n if df.shape[1]:\n # GH#51474: block-wise axis=1 reduction avoiding expensive\n # transpose for numpy-backed and 2D EA blocks.\n if (\n name in (\"sum\", \"prod\", \"min\", \"max\", \"any\", \"all\", \"mean\")\n and len(df._mgr.blocks) > 1\n and all(\n (isinstance(bv, np.ndarray) and bv.dtype.kind != \"O\")\n or (\n isinstance(bv, ExtensionArray)\n and bv.ndim == 2\n and name in (\"min\", \"max\")\n and skipna\n )\n for bv in (block.values for block in df._mgr.blocks)\n )\n ):\n return df._reduce_axis1(\n name,\n op,\n skipna=skipna,\n min_count=kwds.get(\"min_count\", 0),\n )\n dtype = find_common_type(\n [block.values.dtype for block in df._mgr.blocks]\n )\n if isinstance(dtype, ExtensionDtype):\n # GH 54341: fastpath for EA-backed axis=1 reductions\n # This flattens the frame into a single 1D array while keeping\n # track of the row and column indices of the original frame. Once\n # flattened, grouping by the row indices and aggregating should\n # be equivalent to transposing the original frame and aggregating\n # with axis=0.\n name = {\"argmax\": \"idxmax\", \"argmin\": \"idxmin\"}.get(name, name)\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n nrows, ncols = df.shape\n row_index = np.tile(np.arange(nrows), ncols)\n col_index = np.repeat(np.arange(ncols), nrows)\n ser = Series(arr, index=col_index, copy=False)\n if name == \"all\":\n # Behavior here appears incorrect; preserving\n # for backwards compatibility for now.\n # See https://github.com/pandas-dev/pandas/issues/57171\n skipna = True\n result = ser.groupby(row_index).agg(name, **kwds, skipna=skipna)\n result.index = df.index\n return result\n\n df = df.T\n\n # After possibly _get_data and transposing, we are now in the\n # simple case where we can use BlockManager.reduce\n res = df._mgr.reduce(blk_func)\n out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]\n out.name = None\n if out_dtype is not None and out.dtype != \"boolean\":\n out = out.astype(out_dtype)\n elif (df._mgr.get_dtypes() == object).any() and name not in [\"any\", \"all\"]:\n out = out.astype(object)\n\n return out\n\n def _reduce_axis1(\n self, name: str, func, skipna: bool, min_count: int = 0\n ) -> Series:\n \"\"\"\n Special case for _reduce to try to avoid a potentially-expensive transpose.\n\n Apply the reduction block-wise along axis=1 and then reduce the resulting\n 1D arrays.\n \"\"\"\n if name == \"all\":\n result = np.ones(len(self), dtype=bool)\n ufunc = np.logical_and\n elif name == \"any\":\n result = np.zeros(len(self), dtype=bool)\n # error: Incompatible types in assignment\n # (expression has type \"_UFunc_Nin2_Nout1[Literal['logical_or'],\n # Literal[20], Literal[False]]\", variable has type\n # \"_UFunc_Nin2_Nout1[Literal['logical_and'], Literal[20],\n # Literal[True]]\")\n ufunc = np.logical_or # type: ignore[assignment]\n elif name in (\"sum\", \"mean\"):\n result = None\n ufunc = np.add # type: ignore[assignment]\n elif name == \"prod\":\n result = None\n ufunc = np.multiply # type: ignore[assignment]\n elif name == \"min\":\n result = None\n ufunc = np.fmin if skipna else np.minimum # type: ignore[assignment]\n elif name == \"max\":\n result = None\n ufunc = np.fmax if skipna else np.maximum # type: ignore[assignment]\n else:\n raise NotImplementedError(name)\n\n for block in self._mgr.blocks:\n vals = block.values\n if name in (\"min\", \"max\"):\n middle = ufunc.reduce(vals, axis=0) # type: ignore[arg-type]\n elif name == \"mean\":\n middle = nanops.nansum(vals, axis=0, skipna=skipna, min_count=0) # type: ignore[arg-type]\n elif name in (\"sum\", \"prod\"):\n # min_count=0 here so each block produces a result;\n # the actual min_count threshold is applied across\n # all blocks after the loop.\n middle = func(vals, axis=0, skipna=skipna, min_count=0)\n else:\n middle = func(vals, axis=0, skipna=skipna)\n if result is None:\n result = middle.copy()\n else:\n result = ufunc(result, middle)\n\n # Handle min_count for sum/prod, and compute mean from sum/count\n if name in (\"sum\", \"prod\", \"mean\"):\n if (min_count > 0 or name == \"mean\") and result is not None:\n non_null_count = np.zeros(len(self), dtype=np.intp)\n for block in self._mgr.blocks:\n vals = block.values\n if vals.dtype.kind in \"biu\":\n # bool/int/uint cannot have NaN\n non_null_count += vals.shape[0]\n else:\n non_null_count += vals.shape[0] - isna(vals).sum(axis=0)\n if name == \"mean\":\n null_mask = non_null_count == 0\n result = result.astype(\"float64\")\n result[~null_mask] /= non_null_count[~null_mask]\n result[null_mask] = np.nan\n else:\n null_mask = non_null_count < min_count\n if null_mask.any():\n if result.dtype.kind not in \"fc\":\n result = result.astype(\"float64\")\n result[null_mask] = np.nan\n\n assert result is not None\n res_ser = self._constructor_sliced(result, index=self.index, copy=False)\n return res_ser\n\n # error: Signature of \"any\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def any(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def any(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def any(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n def any(\n self,\n *,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether any element is True, potentially over an axis.\n\n Returns False unless there is at least one element within a series or\n along a Dataframe axis that is True or equivalent (e.g. non-zero or\n non-empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be False, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n numpy.any : Numpy version of this method.\n Series.any : Return whether any element is True.\n Series.all : Return whether all elements are True.\n DataFrame.any : Return whether any element is True over requested axis.\n DataFrame.all : Return whether all elements are True over requested axis.\n\n Examples\n --------\n **Series**\n\n For Series input, the output is a scalar indicating whether any element\n is True.\n\n >>> pd.Series([False, False]).any()\n False\n >>> pd.Series([True, False]).any()\n True\n >>> pd.Series([], dtype=\"float64\").any()\n False\n >>> pd.Series([np.nan]).any()\n False\n >>> pd.Series([np.nan]).any(skipna=False)\n True\n\n **DataFrame**\n\n Whether each column contains at least one True element (the default).\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0, 2], \"C\": [0, 0]})\n >>> df\n A B C\n 0 1 0 0\n 1 2 2 0\n\n >>> df.any()\n A True\n B True\n C False\n dtype: bool\n\n Aggregating over the columns.\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 2]})\n >>> df\n A B\n 0 True 1\n 1 False 2\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 True\n dtype: bool\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 0]})\n >>> df\n A B\n 0 True 1\n 1 False 0\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Aggregating over the entire DataFrame with ``axis=None``.\n\n >>> df.any(axis=None)\n True\n\n `any` for an empty DataFrame is an empty Series.\n\n >>> pd.DataFrame([]).any()\n Series([], dtype: bool)\n \"\"\"\n result = self._logical_func(\n \"any\", nanops.nanany, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"any\")\n return result\n\n @overload\n def all(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def all(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def all(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"all\")\n def all(\n self,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether all elements are True, potentially over an axis.\n\n Returns True unless there at least one element within a series or\n along a Dataframe axis that is False or equivalent (e.g. zero or\n empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be True, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n Series.all : Return True if all elements are True.\n DataFrame.any : Return True if one (or more) elements are True.\n\n Examples\n --------\n **Series**\n\n >>> pd.Series([True, True]).all()\n True\n >>> pd.Series([True, False]).all()\n False\n >>> pd.Series([], dtype=\"float64\").all()\n True\n >>> pd.Series([np.nan]).all()\n True\n >>> pd.Series([np.nan]).all(skipna=False)\n True\n\n **DataFrames**\n\n Create a DataFrame from a dictionary.\n\n >>> df = pd.DataFrame({\"col1\": [True, True], \"col2\": [True, False]})\n >>> df\n col1 col2\n 0 True True\n 1 True False\n\n Default behaviour checks if values in each column all return True.\n\n >>> df.all()\n col1 True\n col2 False\n dtype: bool\n\n Specify ``axis='columns'`` to check if values in each row all return True.\n\n >>> df.all(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Or ``axis=None`` for whether every value is True.\n\n >>> df.all(axis=None)\n False\n \"\"\"\n result = self._logical_func(\n \"all\", nanops.nanall, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"all\")\n return result\n\n # error: Signature of \"min\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def min(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def min(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def min(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"min\")\n def min(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the minimum of the values over the requested axis.\n\n If you want the *index* of the minimum, use ``idxmin``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmin``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.min()\n 0\n \"\"\"\n result = super().min(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"min\")\n return result\n\n # error: Signature of \"max\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def max(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def max(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def max(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"max\")\n def max(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the maximum of the values over the requested axis.\n\n If you want the *index* of the maximum, use ``idxmax``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmax``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.max()\n 8\n \"\"\"\n result = super().max(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"max\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sum\")\n def sum(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the sum of the values over the requested axis.\n\n This is equivalent to the method ``numpy.sum``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sum with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Sum over requested axis.\n\n See Also\n --------\n Series.sum : Return the sum over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.std : Return the standard deviation of the values over the\n requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.sum()\n 14\n\n By default, the sum of an empty or all-NA Series is ``0``.\n\n >>> pd.Series([], dtype=\"float64\").sum() # min_count=0 is the default\n 0.0\n\n This can be controlled with the ``min_count`` parameter. For example, if\n you'd like the sum of an empty series to be NaN, pass ``min_count=1``.\n\n >>> pd.Series([], dtype=\"float64\").sum(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).sum()\n 0.0\n\n >>> pd.Series([np.nan]).sum(min_count=1)\n nan\n \"\"\"\n result = super().sum(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sum\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"prod\")\n def prod(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the product of the values over the requested axis.\n\n This multiplies all values in each column (or row when\n ``axis=1``) together, skipping missing values by default.\n An empty or all-NA column returns ``1`` unless ``min_count``\n is specified.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.prod with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n The product of the values over the requested axis.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n By default, the product of an empty or all-NA Series is ``1``\n\n >>> pd.Series([], dtype=\"float64\").prod()\n 1.0\n\n This can be controlled with the ``min_count`` parameter\n\n >>> pd.Series([], dtype=\"float64\").prod(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).prod()\n 1.0\n\n >>> pd.Series([np.nan]).prod(min_count=1)\n nan\n \"\"\"\n result = super().prod(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"prod\")\n return result\n\n # error: Signature of \"mean\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def mean(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def mean(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def mean(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"mean\")\n def mean(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the mean of the values over the requested axis.\n\n This computes the arithmetic mean of the values in each column\n (or row when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.mean()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.mean()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.mean(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.mean(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().mean(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"mean\")\n return result\n\n # error: Signature of \"median\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def median(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def median(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def median(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\"], name=\"median\"\n )\n def median(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the median of the values over the requested axis.\n\n This computes the median of the values in each column (or row\n when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.median()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.median()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.median(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.median(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().median(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"median\")\n return result\n\n # error: Signature of \"sem\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sem(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def sem(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def sem(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sem\")\n def sem(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased standard error of the mean over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sem with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series\n Unbiased standard error of the mean over requested axis.\n\n See Also\n --------\n DataFrame.var : Return unbiased variance over requested axis.\n DataFrame.std : Returns sample standard deviation over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> round(s.sem(), 6)\n 0.57735\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.sem()\n a 0.5\n b 0.5\n dtype: float64\n\n Using axis=1\n\n >>> df.sem(axis=1)\n tiger 0.5\n zebra 0.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.sem(numeric_only=True)\n a 0.5\n dtype: float64\n \"\"\"\n result = super().sem(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sem\")\n return result\n\n # error: Signature of \"var\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def var(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def var(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def var(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"var\")\n def var(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased variance over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.var with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series or scalaer\n Unbiased variance over requested axis.\n\n See Also\n --------\n numpy.var : Equivalent function in NumPy.\n Series.var : Return unbiased variance over Series values.\n Series.std : Return standard deviation over Series values.\n DataFrame.std : Return standard deviation of the values over\n the requested axis.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n >>> df.var()\n age 352.916667\n height 0.056367\n dtype: float64\n\n Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:\n\n >>> df.var(ddof=0)\n age 264.687500\n height 0.042275\n dtype: float64\n \"\"\"\n result = super().var(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"var\")\n return result\n\n # error: Signature of \"std\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def std(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def std(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def std(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"std\")\n def std(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return sample standard deviation over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.std with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs : dict\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Standard deviation over requested axis.\n\n See Also\n --------\n Series.std : Return standard deviation over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.sum : Return the sum of the values over the requested axis.\n\n Notes\n -----\n To have the same behaviour as ``numpy.std``, use ``ddof=0`` (instead of\n the default ``ddof=1``) and ``skipna=False``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n The standard deviation of the columns can be found as follows:\n\n >>> df.std()\n age 18.786076\n height 0.237417\n dtype: float64\n\n Alternatively, `ddof=0` can be set to normalize by N instead of N-1:\n\n >>> df.std(ddof=0)\n age 16.269219\n height 0.205609\n dtype: float64\n \"\"\"\n result = super().std(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"std\")\n return result\n\n # error: Signature of \"skew\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def skew(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def skew(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def skew(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"skew\")\n def skew(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased skew over requested axis.\n\n Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased skew over requested axis.\n\n See Also\n --------\n DataFrame.kurt : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.skew()\n 0.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [2, 3, 4], \"c\": [1, 3, 5]},\n ... index=[\"tiger\", \"zebra\", \"cow\"],\n ... )\n >>> df\n a b c\n tiger 1 2 1\n zebra 2 3 3\n cow 3 4 5\n >>> df.skew()\n a 0.0\n b 0.0\n c 0.0\n dtype: float64\n\n Using axis=1\n\n >>> df.skew(axis=1)\n tiger 1.732051\n zebra -1.732051\n cow 0.000000\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [\"T\", \"Z\", \"X\"]}, index=[\"tiger\", \"zebra\", \"cow\"]\n ... )\n >>> df.skew(numeric_only=True)\n a 0.0\n dtype: float64\n \"\"\"\n result = super().skew(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"skew\")\n return result\n\n # error: Signature of \"kurt\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def kurt(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"kurt\")\n def kurt(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased kurtosis over requested axis.\n\n Kurtosis obtained using Fisher's definition of\n kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased kurtosis over requested axis.\n\n See Also\n --------\n DataFrame.kurtosis : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 2, 3], index=[\"cat\", \"dog\", \"dog\", \"mouse\"])\n >>> s\n cat 1\n dog 2\n dog 2\n mouse 3\n dtype: int64\n >>> round(s.kurt(), 6)\n 1.5\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 2, 3], \"b\": [3, 4, 4, 4]},\n ... index=[\"cat\", \"dog\", \"dog\", \"mouse\"],\n ... )\n >>> df\n a b\n cat 1 3\n dog 2 4\n dog 2 4\n mouse 3 4\n >>> round(df.kurt(), 6)\n a 1.5\n b 4.0\n dtype: float64\n\n With axis=None\n\n >>> round(df.kurt(axis=None), 6)\n -0.988693\n\n Using axis=1\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2], \"b\": [3, 4], \"c\": [3, 4], \"d\": [1, 2]},\n ... index=[\"cat\", \"dog\"],\n ... )\n >>> df.kurt(axis=1)\n cat -6.0\n dog -6.0\n dtype: float64\n \"\"\"\n result = super().kurt(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"kurt\")\n return result\n\n # error: Incompatible types in assignment\n kurtosis = kurt # type: ignore[assignment]\n product = prod\n\n def cummin(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative minimum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n minimum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative minimum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.min : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.min : Return the minimum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummin()\n 0 2.0\n 1 NaN\n 2 2.0\n 3 -1.0\n 4 -1.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummin(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the minimum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummin()\n A B\n 0 2.0 1.0\n 1 2.0 NaN\n 2 1.0 0.0\n\n To iterate over columns and find the minimum in each row,\n use ``axis=1``\n\n >>> df.cummin(axis=1)\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummin(data, axis, skipna, *args, **kwargs)\n\n def cummax(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative maximum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n maximum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative maximum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.max : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.max : Return the maximum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummax()\n 0 2.0\n 1 NaN\n 2 5.0\n 3 5.0\n 4 5.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummax(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the maximum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummax()\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 3.0 1.0\n\n To iterate over columns and find the maximum in each row,\n use ``axis=1``\n\n >>> df.cummax(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummax(data, axis, skipna, *args, **kwargs)\n\n def cumsum(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative sum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n sum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative sum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.sum : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.sum : Return the sum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumsum()\n 0 2.0\n 1 NaN\n 2 7.0\n 3 6.0\n 4 6.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumsum(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the sum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumsum()\n A B\n 0 2.0 1.0\n 1 5.0 NaN\n 2 6.0 1.0\n\n To iterate over columns and find the sum in each row,\n use ``axis=1``\n\n >>> df.cumsum(axis=1)\n A B\n 0 2.0 3.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumsum(data, axis, skipna, *args, **kwargs)\n\n def cumprod(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative product over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n product.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative product of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.prod : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.prod : Return the product over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumprod()\n 0 2.0\n 1 NaN\n 2 10.0\n 3 -10.0\n 4 -0.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumprod(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the product\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumprod()\n A B\n 0 2.0 1.0\n 1 6.0 NaN\n 2 6.0 0.0\n\n To iterate over columns and find the product in each row,\n use ``axis=1``\n\n >>> df.cumprod(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumprod(data, axis, skipna, *args, **kwargs)\n\n def nunique(self, axis: Axis = 0, dropna: bool = True) -> Series:\n \"\"\"\n Count number of distinct elements in specified axis.\n\n Return Series with number of distinct elements. Can ignore NaN\n values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for\n column-wise.\n dropna : bool, default True\n Don't include NaN in the counts.\n\n Returns\n -------\n Series\n Series with counts of unique values per row or column, depending on `axis`.\n\n See Also\n --------\n Series.nunique: Method nunique for Series.\n DataFrame.count: Count non-NA cells for each column or row.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [4, 5, 6], \"B\": [4, 1, 1]})\n >>> df.nunique()\n A 3\n B 2\n dtype: int64\n\n >>> df.nunique(axis=1)\n 0 1\n 1 2\n 2 2\n dtype: int64\n \"\"\"\n return self.apply(Series.nunique, axis=axis, dropna=dropna)\n\n def idxmin(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of minimum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of minima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmin : Return index of the minimum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmin``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the minimum value in each column.\n\n >>> df.idxmin()\n consumption Pork\n co2_emissions Wheat Products\n dtype: str\n\n To return the index for the minimum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmin(axis=\"columns\")\n Pork consumption\n Wheat Products co2_emissions\n Beef consumption\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmin, \"argmin\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be np.ndarray since axis is not N\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmin\")\n\n def idxmax(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of maximum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of maxima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmax : Return index of the maximum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmax``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the maximum value in each column.\n\n >>> df.idxmax()\n consumption Wheat Products\n co2_emissions Beef\n dtype: str\n\n To return the index for the maximum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmax(axis=\"columns\")\n Pork co2_emissions\n Wheat Products consumption\n Beef co2_emissions\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmax, \"argmax\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be 1d array since axis is not None\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmax\")\n\n def _get_agg_axis(self, axis_num: int) -> Index:\n \"\"\"\n Let's be explicit about this.\n \"\"\"\n if axis_num == 0:\n return self.columns\n elif axis_num == 1:\n return self.index\n else:\n raise ValueError(f\"Axis must be 0 or 1 (got {axis_num!r})\")\n\n def mode(\n self, axis: Axis = 0, numeric_only: bool = False, dropna: bool = True\n ) -> DataFrame:\n \"\"\"\n Get the mode(s) of each element along the selected axis.\n\n The mode of a set of values is the value that appears most often.\n It can be multiple values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to iterate over while searching for the mode:\n\n * 0 or 'index' : get mode of each column\n * 1 or 'columns' : get mode of each row.\n\n numeric_only : bool, default False\n If True, only apply to numeric columns.\n dropna : bool, default True\n Don't consider counts of NaN/NaT.\n\n Returns\n -------\n DataFrame\n The modes of each column or row.\n\n See Also\n --------\n Series.mode : Return the highest frequency value in a Series.\n Series.value_counts : Return the counts of values in a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"bird\", 2, 2),\n ... (\"mammal\", 4, np.nan),\n ... (\"arthropod\", 8, 0),\n ... (\"bird\", 2, np.nan),\n ... ],\n ... index=(\"falcon\", \"horse\", \"spider\", \"ostrich\"),\n ... columns=(\"species\", \"legs\", \"wings\"),\n ... )\n >>> df\n species legs wings\n falcon bird 2 2.0\n horse mammal 4 NaN\n spider arthropod 8 0.0\n ostrich bird 2 NaN\n\n By default, missing values are not considered, and the mode of wings\n are both 0 and 2. Because the resulting DataFrame has two rows,\n the second row of ``species`` and ``legs`` contains ``NaN``.\n\n >>> df.mode()\n species legs wings\n 0 bird 2.0 0.0\n 1 NaN NaN 2.0\n\n Setting ``dropna=False`` ``NaN`` values are considered and they can be\n the mode (like for wings).\n\n >>> df.mode(dropna=False)\n species legs wings\n 0 bird 2 NaN\n\n Setting ``numeric_only=True``, only the mode of numeric columns is\n computed, and columns of other types are ignored.\n\n >>> df.mode(numeric_only=True)\n legs wings\n 0 2.0 0.0\n 1 NaN 2.0\n\n To compute the mode over columns and not rows, use the axis parameter:\n\n >>> df.mode(axis=\"columns\", numeric_only=True)\n 0 1\n falcon 2.0 NaN\n horse 4.0 NaN\n spider 0.0 8.0\n ostrich 2.0 NaN\n \"\"\"\n data = self if not numeric_only else self._get_numeric_data()\n\n def f(s):\n return s.mode(dropna=dropna)\n\n data = data.apply(f, axis=axis)\n # Ensure index is type stable (should always use int index)\n if data.empty:\n data.index = default_index(0)\n\n return data\n\n @overload\n def quantile(\n self,\n q: float = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series: ...\n\n @overload\n def quantile(\n self,\n q: AnyArrayLike | Sequence[float],\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n @overload\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = 0.5,\n axis: Axis = 0,\n numeric_only: bool = False,\n interpolation: QuantileInterpolation = \"linear\",\n method: Literal[\"single\", \"table\"] = \"single\",\n ) -> Series | DataFrame:\n \"\"\"\n Return values at the given quantile over requested axis.\n\n This method computes the value below which a given proportion of\n observations fall. By default, it computes quantiles column-wise,\n but row-wise computation is also supported via ``axis=1``.\n\n Parameters\n ----------\n q : float or array-like, default 0.5 (50% quantile)\n Value between 0 <= q <= 1, the quantile(s) to compute.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Equals 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}\n This optional parameter specifies the interpolation method to use,\n when the desired quantile lies between two data points `i` and `j`:\n\n * linear: `i + (j - i) * fraction`, where `fraction` is the\n fractional part of the index surrounded by `i` and `j`.\n * lower: `i`.\n * higher: `j`.\n * nearest: `i` or `j` whichever is nearest.\n * midpoint: (`i` + `j`) / 2.\n method : {'single', 'table'}, default 'single'\n Whether to compute quantiles per-column ('single') or over all columns\n ('table'). When 'table', the only allowed interpolation methods are\n 'nearest', 'lower', and 'higher'.\n\n Returns\n -------\n Series or DataFrame\n\n If ``q`` is an array, a DataFrame will be returned where the\n index is ``q``, the columns are the columns of self, and the\n values are the quantiles.\n If ``q`` is a float, a Series will be returned where the\n index is the columns of self and the values are the quantiles.\n\n See Also\n --------\n core.window.rolling.Rolling.quantile: Rolling quantile.\n numpy.percentile: Numpy function to compute the percentile.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=[\"a\", \"b\"]\n ... )\n >>> df.quantile(0.1)\n a 1.3\n b 3.7\n Name: 0.1, dtype: float64\n >>> df.quantile([0.1, 0.5])\n a b\n 0.1 1.3 3.7\n 0.5 2.5 55.0\n\n Specifying `method='table'` will compute the quantile over all columns.\n\n >>> df.quantile(0.1, method=\"table\", interpolation=\"nearest\")\n a 1\n b 1\n Name: 0.1, dtype: int64\n >>> df.quantile([0.1, 0.5], method=\"table\", interpolation=\"nearest\")\n a b\n 0.1 1 1\n 0.5 3 100\n\n Specifying `numeric_only=False` will compute the quantiles for all\n columns.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [1, 2],\n ... \"B\": [pd.Timestamp(\"2010\"), pd.Timestamp(\"2011\")],\n ... \"C\": [pd.Timedelta(\"1 days\"), pd.Timedelta(\"2 days\")],\n ... }\n ... )\n >>> df.quantile(0.5, numeric_only=False)\n A 1.5\n B 2010-07-02 12:00:00\n C 1 days 12:00:00\n Name: 0.5, dtype: object\n \"\"\"\n validate_percentile(q)\n axis = self._get_axis_number(axis)\n\n if not is_list_like(q):\n # BlockManager.quantile expects listlike, so we wrap and unwrap here\n # error: List item 0 has incompatible type \"float | ExtensionArray |\n # ndarray[Any, Any] | Index | Series | Sequence[float]\"; expected \"float\"\n res_df = self.quantile(\n [q], # type: ignore[list-item]\n axis=axis,\n numeric_only=numeric_only,\n interpolation=interpolation,\n method=method,\n )\n if method == \"single\":\n res = res_df.iloc[0]\n else:\n # cannot directly iloc over sparse arrays\n res = res_df.T.iloc[:, 0]\n if axis == 1 and len(self) == 0:\n # GH#41544 try to get an appropriate dtype\n dtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(dtype):\n return res.astype(dtype)\n return res\n\n q = Index(q, dtype=np.float64)\n data = self._get_numeric_data() if numeric_only else self\n\n if axis == 1:\n data = data.T\n\n if len(data.columns) == 0:\n # GH#23925 _get_numeric_data may have dropped all columns\n cols = self.columns[:0]\n\n dtype = np.float64\n if axis == 1:\n # GH#41544 try to get an appropriate dtype\n cdtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(cdtype):\n dtype = cdtype\n\n res = self._constructor([], index=q, columns=cols, dtype=dtype)\n return res.__finalize__(self, method=\"quantile\")\n\n valid_method = {\"single\", \"table\"}\n if method not in valid_method:\n raise ValueError(\n f\"Invalid method: {method}. Method must be in {valid_method}.\"\n )\n if method == \"single\":\n res = data._mgr.quantile(qs=q, interpolation=interpolation)\n elif method == \"table\":\n valid_interpolation = {\"nearest\", \"lower\", \"higher\"}\n if interpolation not in valid_interpolation:\n raise ValueError(\n f\"Invalid interpolation: {interpolation}. \"\n f\"Interpolation must be in {valid_interpolation}\"\n )\n # handle degenerate case\n if len(data) == 0:\n if data.ndim == 2:\n dtype = find_common_type(list(self.dtypes))\n else:\n dtype = self.dtype\n return self._constructor([], index=q, columns=data.columns, dtype=dtype)\n\n q_idx = np.quantile(np.arange(len(data)), q, method=interpolation)\n\n by = data.columns\n if len(by) > 1:\n keys = [data._get_label_or_level_values(x) for x in by]\n indexer = lexsort_indexer(keys)\n else:\n k = data._get_label_or_level_values(by[0])\n indexer = nargsort(k)\n\n res = data._mgr.take(indexer[q_idx], verify=False)\n res.axes[1] = q\n\n result = self._constructor_from_mgr(res, axes=res.axes)\n return result.__finalize__(self, method=\"quantile\")\n\n def to_timestamp(\n self,\n freq: Frequency | None = None,\n how: ToTimestampHow = \"start\",\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Cast PeriodIndex to DatetimeIndex of timestamps, at *beginning* of period.\n\n This can be changed to the *end* of the period, by specifying `how=\"e\"`.\n\n Parameters\n ----------\n freq : str, default frequency of PeriodIndex\n Desired frequency.\n how : {'s', 'e', 'start', 'end'}\n Convention for converting period to timestamp; start of period\n vs. end.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame with DatetimeIndex\n DataFrame with the PeriodIndex cast to DatetimeIndex.\n\n See Also\n --------\n DataFrame.to_period: Inverse method to cast DatetimeIndex to PeriodIndex.\n Series.to_timestamp: Equivalent method for Series.\n\n Examples\n --------\n >>> idx = pd.PeriodIndex([\"2023\", \"2024\"], freq=\"Y\")\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d, index=idx)\n >>> df1\n col1 col2\n 2023 1 3\n 2024\t 2 4\n\n The resulting timestamps will be at the beginning of the year in this case\n\n >>> df1 = df1.to_timestamp()\n >>> df1\n col1 col2\n 2023-01-01 1 3\n 2024-01-01 2 4\n >>> df1.index\n DatetimeIndex(['2023-01-01', '2024-01-01'], dtype='datetime64[us]', freq=None)\n\n Using `freq` which is the offset that the Timestamps will have\n\n >>> df2 = pd.DataFrame(data=d, index=idx)\n >>> df2 = df2.to_timestamp(freq=\"M\")\n >>> df2\n col1 col2\n 2023-01-31 1 3\n 2024-01-31 2 4\n >>> df2.index\n DatetimeIndex(['2023-01-31', '2024-01-31'], dtype='datetime64[us]', freq=None)\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, PeriodIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_timestamp(freq=freq, how=how)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def to_period(\n self,\n freq: Frequency | None = None,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Convert DataFrame from DatetimeIndex to PeriodIndex.\n\n Convert DataFrame from DatetimeIndex to PeriodIndex with desired\n frequency (inferred from index if not passed). Either index of columns can be\n converted, depending on `axis` argument.\n\n Parameters\n ----------\n freq : str, default\n Frequency of the PeriodIndex.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The DataFrame with the converted PeriodIndex.\n\n See Also\n --------\n Series.to_period: Equivalent method for Series.\n Series.dt.to_period: Convert DateTime column values.\n\n Examples\n --------\n >>> idx = pd.to_datetime(\n ... [\n ... \"2001-03-31 00:00:00\",\n ... \"2002-05-31 00:00:00\",\n ... \"2003-08-31 00:00:00\",\n ... ]\n ... )\n\n >>> idx\n DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],\n dtype='datetime64[us]', freq=None)\n\n >>> idx.to_period(\"M\")\n PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')\n\n For the yearly frequency\n\n >>> idx.to_period(\"Y\")\n PeriodIndex(['2001', '2002', '2003'], dtype='period[Y-DEC]')\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, DatetimeIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_period(freq=freq)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def isin(self, values: Series | DataFrame | Sequence | Mapping) -> DataFrame:\n \"\"\"\n Whether each element in the DataFrame is contained in values.\n\n Returns a DataFrame of the same shape with boolean values: True\n where the element is in the corresponding structure of\n ``values``, False otherwise. ``values`` can be a list, dict,\n Series, or DataFrame; alignment rules depend on its type.\n\n Parameters\n ----------\n values : iterable, Series, DataFrame or dict\n The result will only be true at a location if all the\n labels match. If `values` is a Series, that's the index. If\n `values` is a dict, the keys must be the column names,\n which must match. If `values` is a DataFrame,\n then both the index and column labels must match.\n\n Returns\n -------\n DataFrame\n DataFrame of booleans showing whether each element in the DataFrame\n is contained in values.\n\n See Also\n --------\n DataFrame.eq: Equality test for DataFrame.\n Series.isin: Equivalent method on Series.\n Series.str.contains: Test if pattern or regex is contained within a\n string of a Series or Index.\n\n Notes\n -----\n ``__iter__`` is used (and not ``__contains__``) to iterate over values\n when checking if it contains the elements in DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4], \"num_wings\": [2, 0]}, index=[\"falcon\", \"dog\"]\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n\n When ``values`` is a list check whether every value in the DataFrame\n is present in the list (which animals have 0 or 2 legs or wings)\n\n >>> df.isin([0, 2])\n num_legs num_wings\n falcon True True\n dog False True\n\n To check if ``values`` is *not* in the DataFrame, use the ``~`` operator:\n\n >>> ~df.isin([0, 2])\n num_legs num_wings\n falcon False False\n dog True False\n\n When ``values`` is a dict, we can pass values to check for each\n column separately:\n\n >>> df.isin({\"num_wings\": [0, 3]})\n num_legs num_wings\n falcon False False\n dog False True\n\n When ``values`` is a Series or DataFrame the index and column must\n match. Note that 'falcon' does not match based on the number of legs\n in other.\n\n >>> other = pd.DataFrame(\n ... {\"num_legs\": [8, 3], \"num_wings\": [0, 2]}, index=[\"spider\", \"falcon\"]\n ... )\n >>> df.isin(other)\n num_legs num_wings\n falcon False True\n dog False False\n \"\"\"\n if isinstance(values, dict):\n from pandas.core.reshape.concat import concat\n\n values = collections.defaultdict(list, values)\n result = concat(\n (\n self.iloc[:, [i]].isin(values[col])\n for i, col in enumerate(self.columns)\n ),\n axis=1,\n )\n elif isinstance(values, Series):\n if not values.index.is_unique:\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self), axis=\"index\")\n elif isinstance(values, DataFrame):\n if not (values.columns.is_unique and values.index.is_unique):\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self))\n else:\n if not is_list_like(values):\n raise TypeError(\n \"only list-like or dict-like objects are allowed \"\n \"to be passed to DataFrame.isin(), \"\n f\"you passed a '{type(values).__name__}'\"\n )\n\n def isin_(x):\n # error: Argument 2 to \"isin\" has incompatible type \"Union[Series,\n # DataFrame, Sequence[Any], Mapping[Any, Any]]\"; expected\n # \"Union[Union[Union[ExtensionArray, ndarray[Any, Any]], Index,\n # Series], List[Any], range]\"\n result = algorithms.isin(\n x.ravel(),\n values, # type: ignore[arg-type]\n )\n return result.reshape(x.shape)\n\n res_mgr = self._mgr.apply(isin_)\n result = self._constructor_from_mgr(\n res_mgr,\n axes=res_mgr.axes,\n )\n return result.__finalize__(self, method=\"isin\")\n\n # ----------------------------------------------------------------------\n # Add index and columns\n _AXIS_ORDERS: list[Literal[\"index\", \"columns\"]] = [\"index\", \"columns\"]\n _AXIS_TO_AXIS_NUMBER: dict[Axis, int] = {\n **NDFrame._AXIS_TO_AXIS_NUMBER,\n 1: 1,\n \"columns\": 1,\n }\n _AXIS_LEN = len(_AXIS_ORDERS)\n _info_axis_number: Literal[1] = 1\n _info_axis_name: Literal[\"columns\"] = \"columns\"\n\n index = properties.AxisProperty(\n axis=1,\n doc=\"\"\"\n The index (row labels) of the DataFrame.\n\n The index of a DataFrame is a series of labels that identify each row.\n The labels can be integers, strings, or any other hashable type. The index\n is used for label-based access and alignment, and can be accessed or\n modified using this attribute.\n\n Returns\n -------\n pandas.Index\n The index labels of the DataFrame.\n\n See Also\n --------\n DataFrame.columns : The column labels of the DataFrame.\n DataFrame.to_numpy : Convert the DataFrame to a NumPy array.\n\n Examples\n --------\n >>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],\n ... 'Age': [25, 30, 35],\n ... 'Location': ['Seattle', 'New York', 'Kona']},\n ... index=([10, 20, 30]))\n >>> df.index\n Index([10, 20, 30], dtype='int64')\n\n In this example, we create a DataFrame with 3 rows and 3 columns,\n including Name, Age, and Location information. We set the index labels to\n be the integers 10, 20, and 30. We then access the `index` attribute of the\n DataFrame, which returns an `Index` object containing the index labels.\n\n >>> df.index = [100, 200, 300]\n >>> df\n Name Age Location\n 100 Alice 25 Seattle\n 200 Bob 30 New York\n 300 Aritra 35 Kona\n\n In this example, we modify the index labels of the DataFrame by assigning\n a new list of labels to the `index` attribute. The DataFrame is then\n updated with the new labels, and the output shows the modified DataFrame.\n \"\"\",\n )\n columns = properties.AxisProperty(\n axis=0,\n doc=\"\"\"\n The column labels of the DataFrame.\n\n This property holds the column names as a pandas ``Index`` object.\n It provides an immutable sequence of column labels that can be\n used for data selection, renaming, and alignment in DataFrame operations.\n\n Returns\n -------\n pandas.Index\n The column labels of the DataFrame.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.axes: Return a list representing the axes of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})\n >>> df\n A B\n 0 1 3\n 1 2 4\n >>> df.columns\n Index(['A', 'B'], dtype='str')\n \"\"\",\n )\n\n # ----------------------------------------------------------------------\n # Add plotting methods to DataFrame\n plot = Accessor(\"plot\", pandas.plotting.PlotAccessor)\n hist = pandas.plotting.hist_frame\n boxplot = pandas.plotting.boxplot_frame\n sparse = Accessor(\"sparse\", SparseFrameAccessor)\n\n # ----------------------------------------------------------------------\n # Internal Interface Methods\n\n\n @property\n def values(self) -> np.ndarray:\n \"\"\"\n Return a Numpy representation of the DataFrame.\n\n .. warning::\n\n We recommend using :meth:`DataFrame.to_numpy` instead.\n ``.values`` offers no way to control the output ``dtype``, copy\n semantics, or the value used to fill missing entries, while\n :meth:`DataFrame.to_numpy` exposes those as the ``dtype``,\n ``copy``, and ``na_value`` arguments. The mutability of the\n result also depends on the DataFrame's internal block layout:\n when the DataFrame is backed by a single block the result is a\n read-only view (writes raise); when there are multiple blocks\n the result is a writable copy whose mutations do not propagate\n back to the DataFrame.\n\n Only the values in the DataFrame will be returned, the axes labels\n will be removed.\n\n Returns\n -------\n numpy.ndarray\n The values of the DataFrame.\n\n See Also\n --------\n DataFrame.to_numpy : Recommended alternative to this method.\n DataFrame.index : Retrieve the index labels.\n DataFrame.columns : Retrieving the column names.\n\n Notes\n -----\n The returned array is not intended to be written to. When the\n DataFrame is backed by a single NumPy array (single dtype, single\n block), the result is a read-only view; when the DataFrame has\n multiple internal blocks (e.g. after adding a new column), the\n result is a copy and modifications to it will not be reflected in\n the original DataFrame. Use :meth:`DataFrame.to_numpy` for more\n explicit control over copy behavior, or use :attr:`DataFrame.iloc`\n to modify values in-place.\n\n The dtype will be a lower-common-denominator dtype (implicit\n upcasting); that is to say if the dtypes (even of numeric types)\n are mixed, the one that accommodates all will be chosen. Use this\n with care if you are not dealing with the blocks.\n\n e.g. If the dtypes are float16 and float32, dtype will be upcast to\n float32. If dtypes are int32 and uint8, dtype will be upcast to\n int32. By :func:`numpy.find_common_type` convention, mixing int64\n and uint64 will result in a float64 dtype.\n\n Examples\n --------\n A DataFrame where all columns are the same type (e.g., int64) results\n in an array of the same type.\n\n >>> df = pd.DataFrame(\n ... {\"age\": [3, 29], \"height\": [94, 170], \"weight\": [31, 115]}\n ... )\n >>> df\n age height weight\n 0 3 94 31\n 1 29 170 115\n >>> df.dtypes\n age int64\n height int64\n weight int64\n dtype: object\n >>> df.values\n array([[ 3, 94, 31],\n [ 29, 170, 115]])\n\n A DataFrame with mixed type columns(e.g., str/object, int64, float32)\n results in an ndarray of the broadest type that accommodates these\n mixed types (e.g., object).\n\n >>> df2 = pd.DataFrame(\n ... [\n ... (\"parrot\", 24.0, \"second\"),\n ... (\"lion\", 80.5, 1),\n ... (\"monkey\", np.nan, None),\n ... ],\n ... columns=(\"name\", \"max_speed\", \"rank\"),\n ... )\n >>> df2.dtypes\n name str\n max_speed float64\n rank object\n dtype: object\n >>> df2.values\n array([['parrot', 24.0, 'second'],\n ['lion', 80.5, 1],\n ['monkey', nan, None]], dtype=object)\n\n ``DataFrame.to_numpy`` produces the same array by default, but lets\n you choose how missing values are represented and request a\n guaranteed copy:\n\n >>> df3 = pd.DataFrame({\"a\": [1, 2], \"b\": [3.0, np.nan]})\n >>> df3.values\n array([[ 1., 3.],\n [ 2., nan]])\n >>> df3.to_numpy(na_value=-1)\n array([[ 1., 3.],\n [ 2., -1.]])\n >>> df3.to_numpy(dtype=\"float32\", copy=True)\n array([[ 1., 3.],\n [ 2., nan]], dtype=float32)\n \"\"\"\n return self._mgr.as_array()\n\n\ndef _from_nested_dict(\n data: Mapping[HashableT, Mapping[HashableT2, T]],\n) -> collections.defaultdict[HashableT2, dict[HashableT, T]]:\n new_data: collections.defaultdict[HashableT2, dict[HashableT, T]] = (\n collections.defaultdict(dict)\n )\n for index, s in data.items():\n for col, v in s.items():\n new_data[col][index] = v\n return new_data\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n```\n
","tags":["context-window","function-retrieval","python","negative-control","256k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":256000,"function_name":"_inferharness_missing_context_probe","function_position":"absent","evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-function-retrieval-32k.jsonl b/backend/data/datasets/context-function-retrieval-32k.jsonl new file mode 100644 index 0000000..e919689 --- /dev/null +++ b/backend/data/datasets/context-function-retrieval-32k.jsonl @@ -0,0 +1,5 @@ +{"id":"function-front-32k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-front-32k\nApproximate target context: 32000 tokens.\nReturn the complete source code of the Python function or method `_constructor_from_mgr`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series ob\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] |\n```\n
","tags":["context-window","function-retrieval","python","front","32k"],"expected_answer":["def _constructor_from_mgr(self, mgr, axes) -> DataFrame:","df = DataFrame._from_mgr(mgr, axes=axes)","if type(self) is DataFrame:","return df"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":32000,"function_name":"_constructor_from_mgr","function_position":"front","evaluation_mode":"function_required_terms","expected_full_answer":" def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)"}} +{"id":"function-middle-32k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-middle-32k\nApproximate target context: 32000 tokens.\nReturn the complete source code of the Python function or method `_arith_method`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict li\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict li\n```\n
","tags":["context-window","function-retrieval","python","middle","32k"],"expected_answer":["def _arith_method(self, other, op) -> DataFrame:","if self._should_reindex_frame_op(other, op, 1, None, None):","return self._arith_method_with_reindex(other, op)","axis: Literal[1] = 1 # only relevant for Series other case"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":32000,"function_name":"_arith_method","function_position":"middle","evaluation_mode":"function_required_terms","expected_full_answer":" def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)"}} +{"id":"function-late-32k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-late-32k\nApproximate target context: 32000 tokens.\nReturn the complete source code of the Python function or method `_reindex_for_setitem`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, por\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ...\n```\n
","tags":["context-window","function-retrieval","python","late","32k"],"expected_answer":["def _reindex_for_setitem(","if value.index.equals(index) or not len(index):","if isinstance(value, Series):","return value._values, value._references"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":32000,"function_name":"_reindex_for_setitem","function_position":"late","evaluation_mode":"function_required_terms","expected_full_answer":"def _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None"}} +{"id":"function-two-blocks-32k","system_prompt":"You are a strict code retrieval engine. Return only the requested code blocks or NOT_FOUND.","prompt":"Context-window function retrieval item: function-two-blocks-32k\nApproximate target context: 32000 tokens.\nReturn the complete source code for `_construct_result` first, then a blank line, then the complete source code for `_to_dict_of_blocks`. For each function, include only its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> Data\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific \n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataF\n```\n
","tags":["context-window","function-retrieval","python","two-functions","32k"],"expected_answer":["def _construct_result(self, result, other) -> DataFrame:","out = self._constructor(result, copy=False).__finalize__(self)","out.columns = self.columns","out.index = self.index","def _to_dict_of_blocks(self) -> dict[str, DataFrame]:","mgr = self._mgr","return {","k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":32000,"function_names":["_construct_result","_to_dict_of_blocks"],"function_position":"two_functions_20_and_80_percent","evaluation_mode":"two_function_required_terms","expected_full_answer":" def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }"}} +{"id":"function-negative-control-32k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-negative-control-32k\nApproximate target context: 32000 tokens.\nThe source may or may not contain a Python function named `_inferharness_missing_context_probe`. If the function is absent, reply exactly: NOT_FOUND.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n```\n
","tags":["context-window","function-retrieval","python","negative-control","32k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":32000,"function_name":"_inferharness_missing_context_probe","function_position":"absent","evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-function-retrieval-4k.jsonl b/backend/data/datasets/context-function-retrieval-4k.jsonl new file mode 100644 index 0000000..246f3e0 --- /dev/null +++ b/backend/data/datasets/context-function-retrieval-4k.jsonl @@ -0,0 +1,5 @@ +{"id":"function-front-4k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-front-4k\nApproximate target context: 4000 tokens.\nReturn the complete source code of the Python function or method `_constructor_from_mgr`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._lib\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr\n```\n","tags":["context-window","function-retrieval","python","front","4k"],"expected_answer":["def _constructor_from_mgr(self, mgr, axes) -> DataFrame:","df = DataFrame._from_mgr(mgr, axes=axes)","if type(self) is DataFrame:","return df"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":4000,"function_name":"_constructor_from_mgr","function_position":"front","evaluation_mode":"function_required_terms","expected_full_answer":" def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)"}} +{"id":"function-middle-4k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-middle-4k\nApproximate target context: 4000 tokens.\nReturn the complete source code of the Python function or method `_arith_method`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n```\n","tags":["context-window","function-retrieval","python","middle","4k"],"expected_answer":["def _arith_method(self, other, op) -> DataFrame:","if self._should_reindex_frame_op(other, op, 1, None, None):","return self._arith_method_with_reindex(other, op)","axis: Literal[1] = 1 # only relevant for Series other case"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":4000,"function_name":"_arith_method","function_position":"middle","evaluation_mode":"function_required_terms","expected_full_answer":" def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)"}} +{"id":"function-late-4k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-late-4k\nApproximate target context: 4000 tokens.\nReturn the complete source code of the Python function or method `_reindex_for_setitem`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not c\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import Sparse\n```\n","tags":["context-window","function-retrieval","python","late","4k"],"expected_answer":["def _reindex_for_setitem(","if value.index.equals(index) or not len(index):","if isinstance(value, Series):","return value._values, value._references"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":4000,"function_name":"_reindex_for_setitem","function_position":"late","evaluation_mode":"function_required_terms","expected_full_answer":"def _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None"}} +{"id":"function-two-blocks-4k","system_prompt":"You are a strict code retrieval engine. Return only the requested code blocks or NOT_FOUND.","prompt":"Context-window function retrieval item: function-two-blocks-4k\nApproximate target context: 4000 tokens.\nReturn the complete source code for `_construct_result` first, then a blank line, then the complete source code for `_to_dict_of_blocks`. For each function, include only its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfr\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], \n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfro\n```\n","tags":["context-window","function-retrieval","python","two-functions","4k"],"expected_answer":["def _construct_result(self, result, other) -> DataFrame:","out = self._constructor(result, copy=False).__finalize__(self)","out.columns = self.columns","out.index = self.index","def _to_dict_of_blocks(self) -> dict[str, DataFrame]:","mgr = self._mgr","return {","k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":4000,"function_names":["_construct_result","_to_dict_of_blocks"],"function_position":"two_functions_20_and_80_percent","evaluation_mode":"two_function_required_terms","expected_full_answer":" def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }"}} +{"id":"function-negative-control-4k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-negative-control-4k\nApproximate target context: 4000 tokens.\nThe source may or may not contain a Python function named `_inferharness_missing_context_probe`. If the function is absent, reply exactly: NOT_FOUND.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n```\n","tags":["context-window","function-retrieval","python","negative-control","4k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":4000,"function_name":"_inferharness_missing_context_probe","function_position":"absent","evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-function-retrieval-64k.jsonl b/backend/data/datasets/context-function-retrieval-64k.jsonl new file mode 100644 index 0000000..b45bfb2 --- /dev/null +++ b/backend/data/datasets/context-function-retrieval-64k.jsonl @@ -0,0 +1,5 @@ +{"id":"function-front-64k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-front-64k\nApproximate target context: 64000 tokens.\nReturn the complete source code of the Python function or method `_constructor_from_mgr`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n \n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # I\n```\n
","tags":["context-window","function-retrieval","python","front","64k"],"expected_answer":["def _constructor_from_mgr(self, mgr, axes) -> DataFrame:","df = DataFrame._from_mgr(mgr, axes=axes)","if type(self) is DataFrame:","return df"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":64000,"function_name":"_constructor_from_mgr","function_position":"front","evaluation_mode":"function_required_terms","expected_full_answer":" def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)"}} +{"id":"function-middle-64k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-middle-64k\nApproximate target context: 64000 tokens.\nReturn the complete source code of the Python function or method `_arith_method`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circ\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circ\n```\n
","tags":["context-window","function-retrieval","python","middle","64k"],"expected_answer":["def _arith_method(self, other, op) -> DataFrame:","if self._should_reindex_frame_op(other, op, 1, None, None):","return self._arith_method_with_reindex(other, op)","axis: Literal[1] = 1 # only relevant for Series other case"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":64000,"function_name":"_arith_method","function_position":"middle","evaluation_mode":"function_required_terms","expected_full_answer":" def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)"}} +{"id":"function-late-64k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-late-64k\nApproximate target context: 64000 tokens.\nReturn the complete source code of the Python function or method `_reindex_for_setitem`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not prese\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n```\n
","tags":["context-window","function-retrieval","python","late","64k"],"expected_answer":["def _reindex_for_setitem(","if value.index.equals(index) or not len(index):","if isinstance(value, Series):","return value._values, value._references"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":64000,"function_name":"_reindex_for_setitem","function_position":"late","evaluation_mode":"function_required_terms","expected_full_answer":"def _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None"}} +{"id":"function-two-blocks-64k","system_prompt":"You are a strict code retrieval engine. Return only the requested code blocks or NOT_FOUND.","prompt":"Context-window function retrieval item: function-two-blocks-64k\nApproximate target context: 64000 tokens.\nReturn the complete source code for `_construct_result` first, then a blank line, then the complete source code for `_to_dict_of_blocks`. For each function, include only its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: igno\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(\n```\n
","tags":["context-window","function-retrieval","python","two-functions","64k"],"expected_answer":["def _construct_result(self, result, other) -> DataFrame:","out = self._constructor(result, copy=False).__finalize__(self)","out.columns = self.columns","out.index = self.index","def _to_dict_of_blocks(self) -> dict[str, DataFrame]:","mgr = self._mgr","return {","k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":64000,"function_names":["_construct_result","_to_dict_of_blocks"],"function_position":"two_functions_20_and_80_percent","evaluation_mode":"two_function_required_terms","expected_full_answer":" def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }"}} +{"id":"function-negative-control-64k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-negative-control-64k\nApproximate target context: 64000 tokens.\nThe source may or may not contain a Python function named `_inferharness_missing_context_probe`. If the function is absent, reply exactly: NOT_FOUND.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensi\n```\n
","tags":["context-window","function-retrieval","python","negative-control","64k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":64000,"function_name":"_inferharness_missing_context_probe","function_position":"absent","evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-function-retrieval-8k.jsonl b/backend/data/datasets/context-function-retrieval-8k.jsonl new file mode 100644 index 0000000..97db195 --- /dev/null +++ b/backend/data/datasets/context-function-retrieval-8k.jsonl @@ -0,0 +1,5 @@ +{"id":"function-front-8k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-front-8k\nApproximate target context: 8000 tokens.\nReturn the complete source code of the Python function or method `_constructor_from_mgr`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pand\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n```\n
","tags":["context-window","function-retrieval","python","front","8k"],"expected_answer":["def _constructor_from_mgr(self, mgr, axes) -> DataFrame:","df = DataFrame._from_mgr(mgr, axes=axes)","if type(self) is DataFrame:","return df"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":8000,"function_name":"_constructor_from_mgr","function_position":"front","evaluation_mode":"function_required_terms","expected_full_answer":" def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)"}} +{"id":"function-middle-8k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-middle-8k\nApproximate target context: 8000 tokens.\nReturn the complete source code of the Python function or method `_arith_method`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n \n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n```\n","tags":["context-window","function-retrieval","python","middle","8k"],"expected_answer":["def _arith_method(self, other, op) -> DataFrame:","if self._should_reindex_frame_op(other, op, 1, None, None):","return self._arith_method_with_reindex(other, op)","axis: Literal[1] = 1 # only relevant for Series other case"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":8000,"function_name":"_arith_method","function_position":"middle","evaluation_mode":"function_required_terms","expected_full_answer":" def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)"}} +{"id":"function-late-8k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-late-8k\nApproximate target context: 8000 tokens.\nReturn the complete source code of the Python function or method `_reindex_for_setitem`, from its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the c\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns)\n```\n","tags":["context-window","function-retrieval","python","late","8k"],"expected_answer":["def _reindex_for_setitem(","if value.index.equals(index) or not len(index):","if isinstance(value, Series):","return value._values, value._references"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":8000,"function_name":"_reindex_for_setitem","function_position":"late","evaluation_mode":"function_required_terms","expected_full_answer":"def _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n return reindexed_value, None"}} +{"id":"function-two-blocks-8k","system_prompt":"You are a strict code retrieval engine. Return only the requested code blocks or NOT_FOUND.","prompt":"Context-window function retrieval item: function-two-blocks-8k\nApproximate target context: 8000 tokens.\nReturn the complete source code for `_construct_result` first, then a blank line, then the complete source code for `_to_dict_of_blocks`. For each function, include only its `def` line through the end of that function. Do not include surrounding code, markdown, or explanation.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure als\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n \n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also\n```\n","tags":["context-window","function-retrieval","python","two-functions","8k"],"expected_answer":["def _construct_result(self, result, other) -> DataFrame:","out = self._constructor(result, copy=False).__finalize__(self)","out.columns = self.columns","out.index = self.index","def _to_dict_of_blocks(self) -> dict[str, DataFrame]:","mgr = self._mgr","return {","k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)"],"expected_format":"code","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":8000,"function_names":["_construct_result","_to_dict_of_blocks"],"function_position":"two_functions_20_and_80_percent","evaluation_mode":"two_function_required_terms","expected_full_answer":" def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }"}} +{"id":"function-negative-control-8k","system_prompt":"You are a strict code retrieval engine. Return only the requested code block or NOT_FOUND.","prompt":"Context-window function retrieval item: function-negative-control-8k\nApproximate target context: 8000 tokens.\nThe source may or may not contain a Python function named `_inferharness_missing_context_probe`. If the function is absent, reply exactly: NOT_FOUND.\n\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool |\n```\n
","tags":["context-window","function-retrieval","python","negative-control","8k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":8000,"function_name":"_inferharness_missing_context_probe","function_position":"absent","evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-needle-128k.jsonl b/backend/data/datasets/context-needle-128k.jsonl new file mode 100644 index 0000000..c3ed33e --- /dev/null +++ b/backend/data/datasets/context-needle-128k.jsonl @@ -0,0 +1,5 @@ +{"id":"needle-front-128k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-front-128k\nApproximate target context: 128000 tokens; needle position: front.\nFind the Python benchmark needle for needle-front-128k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n# InferHarness context needle: needle-front-128k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_128K_FRONT\"\n# End InferHarness context needle\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n```\n
","tags":["context-window","needle-retrieval","python","front","128k"],"expected_answer":"IH_NEEDLE_128K_FRONT","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":128000,"needle_position":"front","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-middle-128k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-middle-128k\nApproximate target context: 128000 tokens; needle position: middle.\nFind the Python benchmark needle for needle-middle-128k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n# InferHarness context needle: needle-middle-128k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_128K_MIDDLE\"\n# End InferHarness context needle\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n```\n
","tags":["context-window","needle-retrieval","python","middle","128k"],"expected_answer":"IH_NEEDLE_128K_MIDDLE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":128000,"needle_position":"middle","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-late-128k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-late-128k\nApproximate target context: 128000 tokens; needle position: late_80_percent.\nFind the Python benchmark needle for needle-late-128k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n# InferHarness context needle: needle-late-128k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_128K_LATE\"\n# End InferHarness context needle\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n```\n
","tags":["context-window","needle-retrieval","python","late_80_percent","128k"],"expected_answer":"IH_NEEDLE_128K_LATE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":128000,"needle_position":"late_80_percent","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-two-facts-128k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-two-facts-128k\nApproximate target context: 128000 tokens; needle position: two_facts_20_and_80_percent.\nFind both Python benchmark needles for needle-two-facts-128k. Reply with the string assigned to BENCHMARK_NEEDLE_ALPHA, then a | separator, then the string assigned to BENCHMARK_NEEDLE_BETA, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n# InferHarness context needle: needle-two-facts-128k alpha\nBENCHMARK_NEEDLE_ALPHA = \"IH_NEEDLE_128K_ALPHA\"\n# End InferHarness context needle alpha\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n# InferHarness context needle: needle-two-facts-128k beta\nBENCHMARK_NEEDLE_BETA = \"IH_NEEDLE_128K_BETA\"\n# End InferHarness context needle beta\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n```\n
","tags":["context-window","needle-retrieval","python","two-facts","128k"],"expected_answer":"IH_NEEDLE_128K_ALPHA|IH_NEEDLE_128K_BETA","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":128000,"needle_position":"two_facts_20_and_80_percent","needle_count":2,"evaluation_mode":"two_fact_exact_values"}} +{"id":"negative-control-128k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: negative-control-128k\nApproximate target context: 128000 tokens; needle position: absent.\nThe source may or may not contain a Python benchmark needle for negative-control-128k. If the needle is absent, reply exactly: NOT_FOUND.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n```\n
","tags":["context-window","needle-retrieval","python","negative-control","128k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":128000,"needle_position":"absent","needle_count":0,"evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-needle-16k.jsonl b/backend/data/datasets/context-needle-16k.jsonl new file mode 100644 index 0000000..59745ef --- /dev/null +++ b/backend/data/datasets/context-needle-16k.jsonl @@ -0,0 +1,5 @@ +{"id":"needle-front-16k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-front-16k\nApproximate target context: 16000 tokens; needle position: front.\nFind the Python benchmark needle for needle-front-16k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n# InferHarness context needle: needle-front-16k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_16K_FRONT\"\n# End InferHarness context needle\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n```\n
","tags":["context-window","needle-retrieval","python","front","16k"],"expected_answer":"IH_NEEDLE_16K_FRONT","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":16000,"needle_position":"front","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-middle-16k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-middle-16k\nApproximate target context: 16000 tokens; needle position: middle.\nFind the Python benchmark needle for needle-middle-16k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n# InferHarness context needle: needle-middle-16k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_16K_MIDDLE\"\n# End InferHarness context needle\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n```\n
","tags":["context-window","needle-retrieval","python","middle","16k"],"expected_answer":"IH_NEEDLE_16K_MIDDLE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":16000,"needle_position":"middle","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-late-16k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-late-16k\nApproximate target context: 16000 tokens; needle position: late_80_percent.\nFind the Python benchmark needle for needle-late-16k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n# InferHarness context needle: needle-late-16k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_16K_LATE\"\n# End InferHarness context needle\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n```\n
","tags":["context-window","needle-retrieval","python","late_80_percent","16k"],"expected_answer":"IH_NEEDLE_16K_LATE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":16000,"needle_position":"late_80_percent","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-two-facts-16k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-two-facts-16k\nApproximate target context: 16000 tokens; needle position: two_facts_20_and_80_percent.\nFind both Python benchmark needles for needle-two-facts-16k. Reply with the string assigned to BENCHMARK_NEEDLE_ALPHA, then a | separator, then the string assigned to BENCHMARK_NEEDLE_BETA, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n# InferHarness context needle: needle-two-facts-16k alpha\nBENCHMARK_NEEDLE_ALPHA = \"IH_NEEDLE_16K_ALPHA\"\n# End InferHarness context needle alpha\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n# InferHarness context needle: needle-two-facts-16k beta\nBENCHMARK_NEEDLE_BETA = \"IH_NEEDLE_16K_BETA\"\n# End InferHarness context needle beta\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n```\n
","tags":["context-window","needle-retrieval","python","two-facts","16k"],"expected_answer":"IH_NEEDLE_16K_ALPHA|IH_NEEDLE_16K_BETA","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":16000,"needle_position":"two_facts_20_and_80_percent","needle_count":2,"evaluation_mode":"two_fact_exact_values"}} +{"id":"negative-control-16k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: negative-control-16k\nApproximate target context: 16000 tokens; needle position: absent.\nThe source may or may not contain a Python benchmark needle for negative-control-16k. If the needle is absent, reply exactly: NOT_FOUND.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n```\n
","tags":["context-window","needle-retrieval","python","negative-control","16k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":16000,"needle_position":"absent","needle_count":0,"evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-needle-256k.jsonl b/backend/data/datasets/context-needle-256k.jsonl new file mode 100644 index 0000000..8b0ad76 --- /dev/null +++ b/backend/data/datasets/context-needle-256k.jsonl @@ -0,0 +1,5 @@ +{"id":"needle-front-256k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-front-256k\nApproximate target context: 256000 tokens; needle position: front.\nFind the Python benchmark needle for needle-front-256k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n# InferHarness context needle: needle-front-256k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_256K_FRONT\"\n# End InferHarness context needle\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n # test_append_empty_frame_to_series_with_dateutil_tz\n row_df = row_df.infer_objects().rename_axis(index.names)\n\n if len(row_df.columns) == len(self.columns):\n # Pre-cast the row's value to the original column dtype where the\n # row's inferred dtype would otherwise force concat to widen the\n # whole column. This avoids an O(N) materialize-and-rebuild\n # roundtrip in _post_expansion_casting, and (for EA dtypes that\n # carry array-level state not encoded in the dtype, e.g. geopandas\n # CRS) preserves that state through concat. GH#65094.\n orig_dtypes = self._mgr.get_dtypes()\n row_dtypes = row_df._mgr.get_dtypes()\n object_dtype = np.dtype(object)\n for i in range(len(self.columns)):\n orig_dtype = orig_dtypes[i]\n if row_dtypes[i] == orig_dtype:\n continue\n if orig_dtype == object_dtype:\n # concat object + anything stays object; post-cast is a\n # no-op, so pre-casting would only add overhead.\n continue\n arr = self._get_column_array(i)\n if isinstance(arr, np.ndarray):\n # infer_and_maybe_downcast expects an EA as its first\n # argument so it can dispatch to _cast_pointwise_result.\n arr = NumpyExtensionArray(arr)\n casted = infer_and_maybe_downcast(arr, row_df._mgr.iget_values(i))\n row_df.isetitem(i, casted)\n\n from pandas.core.reshape.concat import concat\n\n result = concat(\n [self, row_df],\n ignore_index=ignore_index,\n )\n return result.__finalize__(self, method=\"append\")\n\n def join(\n self,\n other: DataFrame | Series | Iterable[DataFrame | Series],\n on: IndexLabel | None = None,\n how: MergeHow = \"left\",\n lsuffix: str = \"\",\n rsuffix: str = \"\",\n sort: bool = False,\n validate: JoinValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Join columns of another DataFrame.\n\n Join columns with `other` DataFrame either on index or on a key\n column. Efficiently join multiple DataFrame objects by index at once by\n passing a list.\n\n Parameters\n ----------\n other : DataFrame, Series, or a list containing any combination of them\n Index should be similar to one of the columns in the caller. If a\n Series is passed, its name attribute must be set, and that will be\n used as the column name in the resulting joined DataFrame.\n on : str, list of str, or array-like, optional\n Column or index level name(s) in the caller to join on the index\n in `other`, otherwise joins index-on-index. If multiple\n values given, the `other` DataFrame must have a MultiIndex. Can\n pass an array as the join key if it is not already contained in\n the calling DataFrame. Like an Excel VLOOKUP operation.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'left'\n How to handle the operation of the two objects.\n\n * left: use calling frame's index (or column if on is specified)\n * right: use `other`'s index.\n * outer: form union of calling frame's index (or column if on is\n specified) with `other`'s index, and sort it lexicographically.\n * inner: form intersection of calling frame's index (or column if\n on is specified) with `other`'s index, preserving the order\n of the calling's one.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use set difference of calling frame's index and `other`'s\n index.\n * right_anti: use set difference of `other`'s index and calling frame's\n index.\n lsuffix : str, default ''\n Suffix to use from left frame's overlapping columns.\n rsuffix : str, default ''\n Suffix to use from right frame's overlapping columns.\n sort : bool, default False\n Order result DataFrame lexicographically by the join key. If False,\n the order of the join key depends on the join type (how keyword).\n validate : str, optional\n If specified, checks if join is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if join keys are unique in both left\n and right datasets.\n * \"one_to_many\" or \"1:m\": check if join keys are unique in left dataset.\n * \"many_to_one\" or \"m:1\": check if join keys are unique in right dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A dataframe containing columns from both the caller and `other`.\n\n See Also\n --------\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n Parameters `on`, `lsuffix`, and `rsuffix` are not supported when\n passing a list of `DataFrame` objects.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K2\", \"K3\", \"K4\", \"K5\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K2 A2\n 3 K3 A3\n 4 K4 A4\n 5 K5 A5\n\n >>> other = pd.DataFrame({\"key\": [\"K0\", \"K1\", \"K2\"], \"B\": [\"B0\", \"B1\", \"B2\"]})\n\n >>> other\n key B\n 0 K0 B0\n 1 K1 B1\n 2 K2 B2\n\n Join DataFrames using their indexes.\n\n >>> df.join(other, lsuffix=\"_caller\", rsuffix=\"_other\")\n key_caller A key_other B\n 0 K0 A0 K0 B0\n 1 K1 A1 K1 B1\n 2 K2 A2 K2 B2\n 3 K3 A3 NaN NaN\n 4 K4 A4 NaN NaN\n 5 K5 A5 NaN NaN\n\n If we want to join using the key columns, we need to set key to be\n the index in both `df` and `other`. The joined DataFrame will have\n key as its index.\n\n >>> df.set_index(\"key\").join(other.set_index(\"key\"))\n A B\n key\n K0 A0 B0\n K1 A1 B1\n K2 A2 B2\n K3 A3 NaN\n K4 A4 NaN\n K5 A5 NaN\n\n Another option to join using the key columns is to use the `on`\n parameter. DataFrame.join always uses `other`'s index but we can use\n any column in `df`. This method preserves the original DataFrame's\n index in the result.\n\n >>> df.join(other.set_index(\"key\"), on=\"key\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K2 A2 B2\n 3 K3 A3 NaN\n 4 K4 A4 NaN\n 5 K5 A5 NaN\n\n Using non-unique key values shows how they are matched.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K1\", \"K3\", \"K0\", \"K1\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K1 A2\n 3 K3 A3\n 4 K0 A4\n 5 K1 A5\n\n >>> df.join(other.set_index(\"key\"), on=\"key\", validate=\"m:1\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K1 A2 B1\n 3 K3 A3 NaN\n 4 K0 A4 B0\n 5 K1 A5 B1\n \"\"\"\n from pandas.core.reshape.concat import concat\n from pandas.core.reshape.merge import merge\n\n if isinstance(other, Series):\n if other.name is None:\n raise ValueError(\"Other Series must have a name\")\n other = DataFrame({other.name: other})\n\n if isinstance(other, DataFrame):\n if how == \"cross\":\n return merge(\n self,\n other,\n how=how,\n on=on,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n return merge(\n self,\n other,\n left_on=on,\n how=how,\n left_index=on is None,\n right_index=True,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n else:\n if on is not None:\n raise ValueError(\n \"Joining multiple DataFrames only supported for joining on index\"\n )\n\n if rsuffix or lsuffix:\n raise ValueError(\n \"Suffixes not supported when joining multiple DataFrames\"\n )\n\n # Mypy thinks the RHS is a\n # \"Union[DataFrame, Series, Iterable[Union[DataFrame, Series]]]\" whereas\n # the LHS is an \"Iterable[DataFrame]\", but in reality both types are\n # \"Iterable[Union[DataFrame, Series]]\" due to the if statements\n frames = [cast(\"DataFrame | Series\", self), *list(other)]\n\n can_concat = all(df.index.is_unique for df in frames)\n\n # join indexes only using concat\n if can_concat:\n if how in {\"left\", \"right\"}:\n res = concat(\n frames, axis=1, join=\"outer\", verify_integrity=True, sort=sort\n )\n index = self.index if how == \"left\" else frames[-1].index\n if sort:\n index = index.sort_values()\n result = res.reindex(index)\n return result\n else:\n if how == \"outer\":\n sort = True\n return concat(\n frames, axis=1, join=how, verify_integrity=True, sort=sort\n )\n\n joined = frames[0]\n\n for frame in frames[1:]:\n joined = merge(\n joined,\n frame,\n sort=sort,\n how=how,\n left_index=True,\n right_index=True,\n validate=validate,\n )\n\n return joined\n\n def merge(\n self,\n right: DataFrame | Series,\n how: MergeHow = \"inner\",\n on: IndexLabel | AnyArrayLike | None = None,\n left_on: IndexLabel | AnyArrayLike | None = None,\n right_on: IndexLabel | AnyArrayLike | None = None,\n left_index: bool = False,\n right_index: bool = False,\n sort: bool = False,\n suffixes: Suffixes = (\"_x\", \"_y\"),\n copy: bool | lib.NoDefault = lib.no_default,\n indicator: str | bool = False,\n validate: MergeValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Merge DataFrame or named Series objects with a database-style join.\n\n A named Series object is treated as a DataFrame with a single named column.\n\n The join is done on columns or indexes. If joining columns on\n columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes\n on indexes or indexes on a column or columns, the index will be passed on.\n When performing a cross merge, no column specifications to merge on are\n allowed.\n\n .. warning::\n\n If both key columns contain rows where the key is a null value, those\n rows will be matched against each other. This is different from usual SQL\n join behaviour and can lead to unexpected results.\n\n Parameters\n ----------\n right : DataFrame or named Series\n Object to merge with.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'inner'\n Type of merge to be performed.\n\n * left: use only keys from left frame, similar to a SQL left outer join;\n preserve key order.\n * right: use only keys from right frame, similar to a SQL right outer join;\n preserve key order.\n * outer: use union of keys from both frames, similar to a SQL full outer\n join; sort keys lexicographically.\n * inner: use intersection of keys from both frames, similar to a SQL inner\n join; preserve the order of the left keys.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use only keys from left frame that are not in right frame,\n similar to SQL left anti join; preserve key order.\n\n .. versionadded:: 3.0\n * right_anti: use only keys from right frame that are not in left frame,\n similar to SQL right anti join; preserve key order.\n\n .. versionadded:: 3.0\n on : Hashable or a sequence of the previous\n Column or index level names to join on. These must be found in both\n DataFrames. If `on` is None and not merging on indexes then this defaults\n to the intersection of the columns in both DataFrames.\n left_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the left DataFrame. Can also\n be an array or list of arrays of the length of the left DataFrame.\n These arrays are treated as if they are columns.\n right_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the right DataFrame. Can also\n be an array or list of arrays of the length of the right DataFrame.\n These arrays are treated as if they are columns.\n left_index : bool, default False\n Use the index from the left DataFrame as the join key(s). If it is a\n MultiIndex, the number of keys in the other DataFrame (either the index\n or a number of columns) must match the number of levels.\n right_index : bool, default False\n Use the index from the right DataFrame as the join key. Same caveats as\n left_index.\n sort : bool, default False\n Sort the join keys lexicographically in the result DataFrame. If False,\n the order of the join keys depends on the join type (how keyword).\n suffixes : list-like, default is (\"_x\", \"_y\")\n A length-2 sequence where each element is optionally a string\n indicating the suffix to add to overlapping column names in\n `left` and `right` respectively. Pass a value of `None` instead\n of a string to indicate that the column name from `left` or\n `right` should be left as-is, with no suffix. At least one of the\n values must not be None.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n indicator : bool or str, default False\n If True, adds a column to the output DataFrame called \"_merge\" with\n information on the source of each row. The column can be given a different\n name by providing a string argument. The column will have a Categorical\n type with the value of \"left_only\" for observations whose merge key only\n appears in the left DataFrame, \"right_only\" for observations\n whose merge key only appears in the right DataFrame, and \"both\"\n if the observation's merge key is found in both DataFrames.\n\n validate : str, optional\n If specified, checks if merge is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if merge keys are unique in both\n left and right datasets.\n * \"one_to_many\" or \"1:m\": check if merge keys are unique in left\n dataset.\n * \"many_to_one\" or \"m:1\": check if merge keys are unique in right\n dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A DataFrame of the two merged objects.\n\n See Also\n --------\n merge_ordered : Merge with optional filling/interpolation.\n merge_asof : Merge on nearest keys.\n DataFrame.join : Similar method using indices.\n\n Examples\n --------\n >>> df1 = pd.DataFrame(\n ... {\"lkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [1, 2, 3, 5]}\n ... )\n >>> df2 = pd.DataFrame(\n ... {\"rkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [5, 6, 7, 8]}\n ... )\n >>> df1\n lkey value\n 0 foo 1\n 1 bar 2\n 2 baz 3\n 3 foo 5\n >>> df2\n rkey value\n 0 foo 5\n 1 bar 6\n 2 baz 7\n 3 foo 8\n\n Merge df1 and df2 on the lkey and rkey columns. The value columns have\n the default suffixes, _x and _y, appended.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\")\n lkey value_x rkey value_y\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2 with specified left and right suffixes\n appended to any overlapping columns.\n\n >>> df1.merge(\n ... df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(\"_left\", \"_right\")\n ... )\n lkey value_left rkey value_right\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2, but raise an exception if the DataFrames have\n any overlapping columns.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(False, False))\n Traceback (most recent call last):\n ...\n ValueError: columns overlap but no suffix specified:\n Index(['value'], dtype='object')\n\n >>> df1 = pd.DataFrame({\"a\": [\"foo\", \"bar\"], \"b\": [1, 2]})\n >>> df2 = pd.DataFrame({\"a\": [\"foo\", \"baz\"], \"c\": [3, 4]})\n >>> df1\n a b\n 0 foo 1\n 1 bar 2\n >>> df2\n a c\n 0 foo 3\n 1 baz 4\n\n >>> df1.merge(df2, how=\"inner\", on=\"a\")\n a b c\n 0 foo 1 3\n\n >>> df1.merge(df2, how=\"left\", on=\"a\")\n a b c\n 0 foo 1 3.0\n 1 bar 2 NaN\n\n >>> df1 = pd.DataFrame({\"left\": [\"foo\", \"bar\"]})\n >>> df2 = pd.DataFrame({\"right\": [7, 8]})\n >>> df1\n left\n 0 foo\n 1 bar\n >>> df2\n right\n 0 7\n 1 8\n\n >>> df1.merge(df2, how=\"cross\")\n left right\n 0 foo 7\n 1 foo 8\n 2 bar 7\n 3 bar 8\n \"\"\"\n self._check_copy_deprecation(copy)\n\n from pandas.core.reshape.merge import merge\n\n return merge(\n self,\n right,\n how=how,\n on=on,\n left_on=left_on,\n right_on=right_on,\n left_index=left_index,\n right_index=right_index,\n sort=sort,\n suffixes=suffixes,\n indicator=indicator,\n validate=validate,\n )\n\n def round(\n self, decimals: int | dict[IndexLabel, int] | Series = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Round numeric columns in a DataFrame to a variable number of decimal places.\n\n Each column can be rounded to a different number of decimal places by\n passing a dict or Series mapping column names to the desired precision.\n Non-numeric columns are left unchanged.\n\n Parameters\n ----------\n decimals : int, dict, Series\n Number of decimal places to round each column to. If an int is\n given, round each column to the same number of places.\n Otherwise dict and Series round to variable numbers of places.\n Column names should be in the keys if `decimals` is a\n dict-like, or in the index if `decimals` is a Series. Any\n columns not included in `decimals` will be left as is. Elements\n of `decimals` which are not columns of the input will be\n ignored.\n *args\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n\n Returns\n -------\n DataFrame\n A DataFrame with the affected columns rounded to the specified\n number of decimal places.\n\n See Also\n --------\n numpy.around : Round a numpy array to the given number of decimals.\n Series.round : Round a Series to the given number of decimals.\n\n Notes\n -----\n For values exactly halfway between rounded decimal values, pandas rounds\n to the nearest even value (e.g. -0.5 and 0.5 round to 0.0, 1.5 and 2.5\n round to 2.0, etc.).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(0.21, 0.32), (0.01, 0.67), (0.66, 0.03), (0.21, 0.18)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df\n dogs cats\n 0 0.21 0.32\n 1 0.01 0.67\n 2 0.66 0.03\n 3 0.21 0.18\n\n By providing an integer each column is rounded to the same number\n of decimal places\n\n >>> df.round(1)\n dogs cats\n 0 0.2 0.3\n 1 0.0 0.7\n 2 0.7 0.0\n 3 0.2 0.2\n\n With a dict, the number of places for specific columns can be\n specified with the column names as key and the number of decimal\n places as value\n\n >>> df.round({\"dogs\": 1, \"cats\": 0})\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n\n Using a Series, the number of places for specific columns can be\n specified with the column names as index and the number of\n decimal places as value\n\n >>> decimals = pd.Series([0, 1], index=[\"cats\", \"dogs\"])\n >>> df.round(decimals)\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n \"\"\"\n from pandas.core.reshape.concat import concat\n\n def _dict_round(df: DataFrame, decimals) -> Iterator[Series]:\n for col, vals in df.items():\n try:\n yield _series_round(vals, decimals[col])\n except KeyError:\n yield vals\n\n def _series_round(ser: Series, decimals: int) -> Series:\n if is_integer_dtype(ser.dtype) or is_float_dtype(ser.dtype):\n return ser.round(decimals)\n elif isinstance(ser._values, (DatetimeArray, TimedeltaArray, PeriodArray)):\n # GH#57781\n # TODO: also the ArrowDtype analogues?\n warnings.warn(\n \"obj.round has no effect with datetime, timedelta, \"\n \"or period dtypes. Use obj.dt.round(...) instead.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n return ser\n\n nv.validate_round(args, kwargs)\n\n if isinstance(decimals, (dict, Series)):\n if isinstance(decimals, Series) and not decimals.index.is_unique:\n raise ValueError(\"Index of decimals must be unique\")\n if is_dict_like(decimals) and not all(\n is_integer(value) for _, value in decimals.items()\n ):\n raise TypeError(\"Values in decimals must be integers\")\n new_cols = list(_dict_round(self, decimals))\n elif is_integer(decimals):\n # Dispatch to Block.round\n # Argument \"decimals\" to \"round\" of \"BaseBlockManager\" has incompatible\n # type \"Union[int, integer[Any]]\"; expected \"int\"\n new_mgr = self._mgr.round(\n decimals=decimals, # type: ignore[arg-type]\n )\n return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(\n self, method=\"round\"\n )\n else:\n raise TypeError(\"decimals must be an integer, a dict-like or a Series\")\n\n if new_cols is not None and len(new_cols) > 0:\n return self._constructor(\n concat(new_cols, axis=1), index=self.index, columns=self.columns\n ).__finalize__(self, method=\"round\")\n else:\n return self.copy(deep=False)\n\n # ----------------------------------------------------------------------\n # Statistical methods, etc.\n\n def describe(\n self,\n percentiles=None,\n include=None,\n exclude=None,\n ) -> DataFrame:\n \"\"\"\n Generate descriptive statistics.\n\n Summarize the central tendency, dispersion, and shape of each\n analyzed column's distribution, excluding ``NaN`` values. By\n default only numeric columns are analyzed; pass ``include`` to\n also analyze non-numeric columns (or ``exclude`` to omit columns\n by dtype).\n\n Parameters\n ----------\n percentiles : list-like of numbers, optional\n The percentiles to include in the output. All should fall\n between 0 and 1. The default, ``None``, returns the 25th,\n 50th, and 75th percentiles.\n include : 'all', list-like of dtypes or None (default), optional\n Which column dtypes to include. Options:\n\n - ``'all'`` : Include all columns, including non-numeric ones.\n - list-like of dtypes : Limit the result to columns of the\n given dtypes, in the style of\n :meth:`DataFrame.select_dtypes` (e.g. ``include=[np.number]``\n or ``include=[\"category\"]``).\n - ``None`` (default) : Include only numeric columns, falling\n back to object and categorical columns if there are no\n numeric columns.\n exclude : list-like of dtypes or None (default), optional\n Column dtypes to omit from the result, in the style of\n :meth:`DataFrame.select_dtypes`. ``None`` (default) excludes\n nothing.\n\n Returns\n -------\n DataFrame\n Summary statistics of the DataFrame's columns.\n\n See Also\n --------\n Series.describe : Generate descriptive statistics of a Series.\n DataFrame.count : Count of non-NA observations per column.\n DataFrame.max : Maximum of the values in each column.\n DataFrame.min : Minimum of the values in each column.\n DataFrame.mean : Mean of the values.\n DataFrame.std : Standard deviation of the observations.\n DataFrame.select_dtypes : Subset of a DataFrame including/excluding\n columns based on their dtype.\n\n Notes\n -----\n For numeric columns, the result's index includes ``count``,\n ``mean``, ``std``, ``min``, ``max``, and the requested\n percentiles. By default the lower percentile is ``25`` and the\n upper is ``75``; the ``50`` percentile is the same as the median.\n\n For object columns, the result's index includes ``count``,\n ``unique``, ``top``, and ``freq``. The ``top`` is the most common\n value and ``freq`` is its count. If multiple values tie for the\n highest count, ``top`` is chosen arbitrarily from among them.\n\n With ``include='all'``, the result's index is the union of the\n per-dtype indices, with ``NaN`` for statistics that do not apply\n to a given column's dtype.\n\n Examples\n --------\n By default, only numeric columns are analyzed.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"categorical\": pd.Categorical([\"d\", \"e\", \"f\"]),\n ... \"numeric\": [1, 2, 3],\n ... \"object\": [\"a\", \"b\", \"c\"],\n ... }\n ... )\n >>> df.describe()\n numeric\n count 3.0\n mean 2.0\n std 1.0\n min 1.0\n 25% 1.5\n 50% 2.0\n 75% 2.5\n max 3.0\n\n All columns regardless of dtype.\n\n >>> df.describe(include=\"all\") # doctest: +SKIP\n categorical numeric object\n count 3 3.0 3\n unique 3 NaN 3\n top f NaN a\n freq 1 NaN 1\n mean NaN 2.0 NaN\n std NaN 1.0 NaN\n min NaN 1.0 NaN\n 25% NaN 1.5 NaN\n 50% NaN 2.0 NaN\n 75% NaN 2.5 NaN\n max NaN 3.0 NaN\n\n Restrict the result to a specific dtype.\n\n >>> df.describe(include=[\"category\"])\n categorical\n count 3\n unique 3\n top d\n freq 1\n\n Exclude a specific dtype.\n\n >>> df.describe(exclude=[np.number]) # doctest: +SKIP\n categorical object\n count 3 3\n unique 3 3\n top f a\n freq 1 1\n \"\"\"\n return super().describe(\n percentiles=percentiles, include=include, exclude=exclude\n )\n\n def corr(\n self,\n method: CorrelationMethod = \"pearson\",\n min_periods: int = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise correlation of columns, excluding NA/null values.\n\n The result is a symmetric DataFrame where each element represents\n the correlation coefficient between two columns. By default, the\n Pearson correlation is computed, but Kendall and Spearman methods\n as well as arbitrary callables are also supported.\n\n Parameters\n ----------\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float. Note that the returned matrix from corr\n will have 1 along the diagonals and will be symmetric\n regardless of the callable's behavior.\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n Correlation matrix.\n\n See Also\n --------\n DataFrame.corrwith : Compute pairwise correlation with another\n DataFrame or Series.\n Series.corr : Compute the correlation between two Series.\n\n Notes\n -----\n Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.\n\n * `Pearson correlation coefficient `_\n * `Kendall rank correlation coefficient `_\n * `Spearman's rank correlation coefficient `_\n\n Examples\n --------\n >>> def histogram_intersection(a, b):\n ... v = np.minimum(a, b).sum().round(decimals=1)\n ... return v\n >>> df = pd.DataFrame(\n ... [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df.corr(method=histogram_intersection)\n dogs cats\n dogs 1.0 0.3\n cats 0.3 1.0\n\n >>> df = pd.DataFrame(\n ... [(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.corr(min_periods=3)\n dogs cats\n dogs 1.0 NaN\n cats NaN 1.0\n \"\"\" # noqa: E501\n data = self._get_numeric_data() if numeric_only else self\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if method == \"pearson\":\n correl = libalgos.nancorr(mat, minp=min_periods)\n elif method == \"spearman\":\n correl = libalgos.nancorr_spearman(mat, minp=min_periods)\n elif method == \"kendall\" or callable(method):\n if min_periods is None:\n min_periods = 1\n mat = mat.T\n corrf = nanops.get_corr_func(method)\n K = len(cols)\n correl = np.empty((K, K), dtype=float)\n mask = np.isfinite(mat)\n for i, ac in enumerate(mat):\n for j, bc in enumerate(mat):\n if i > j:\n continue\n\n valid = mask[i] & mask[j]\n if valid.sum() < min_periods:\n c = np.nan\n elif i == j:\n c = 1.0\n elif not valid.all():\n c = corrf(ac[valid], bc[valid])\n else:\n c = corrf(ac, bc)\n correl[i, j] = c\n correl[j, i] = c\n else:\n raise ValueError(\n \"method must be either 'pearson', \"\n \"'spearman', 'kendall', or a callable, \"\n f\"'{method}' was supplied\"\n )\n\n result = self._constructor(correl, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"corr\")\n\n def cov(\n self,\n min_periods: int | None = None,\n ddof: int | None = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise covariance of columns, excluding NA/null values.\n\n Compute the pairwise covariance among the series of a DataFrame.\n The returned data frame is the `covariance matrix\n `__ of the columns\n of the DataFrame.\n\n Both NA and null values are automatically excluded from the\n calculation. (See the note below about bias from missing values.)\n A threshold can be set for the minimum number of\n observations for each value created. Comparisons with observations\n below this threshold will be returned as ``NaN``.\n\n This method is generally used for the analysis of time series data to\n understand the relationship between different measures\n across time.\n\n Parameters\n ----------\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result.\n\n ddof : int, default 1\n Delta degrees of freedom. The divisor used in calculations\n is ``N - ddof``, where ``N`` represents the number of elements.\n This argument is applicable only when no ``nan`` is in the dataframe.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n The covariance matrix of the series of the DataFrame.\n\n See Also\n --------\n Series.cov : Compute covariance with another Series.\n core.window.ewm.ExponentialMovingWindow.cov : Exponential weighted sample\n covariance.\n core.window.expanding.Expanding.cov : Expanding sample covariance.\n core.window.rolling.Rolling.cov : Rolling sample covariance.\n\n Notes\n -----\n Returns the covariance matrix of the DataFrame's time series.\n The covariance is normalized by N-ddof.\n\n For DataFrames that have Series that are missing data (assuming that\n data is `missing at random\n `__)\n the returned covariance matrix will be an unbiased estimate\n of the variance and covariance between the member Series.\n\n However, for many applications this estimate may not be acceptable\n because the estimate covariance matrix is not guaranteed to be positive\n semi-definite. This could lead to estimate correlations having\n absolute values which are greater than one, and/or a non-invertible\n covariance matrix. See `Estimation of covariance matrices\n `__ for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(1, 2), (0, 3), (2, 0), (1, 1)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.cov()\n dogs cats\n dogs 0.666667 -1.000000\n cats -1.000000 1.666667\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(\n ... np.random.randn(1000, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"]\n ... )\n >>> df.cov()\n a b c d e\n a 0.998438 -0.020161 0.059277 -0.008943 0.014144\n b -0.020161 1.059352 -0.008543 -0.024738 0.009826\n c 0.059277 -0.008543 1.010670 -0.001486 -0.000271\n d -0.008943 -0.024738 -0.001486 0.921297 -0.013692\n e 0.014144 0.009826 -0.000271 -0.013692 0.977795\n\n **Minimum number of periods**\n\n This method also supports an optional ``min_periods`` keyword\n that specifies the required minimum number of non-NA observations for\n each column pair in order to have a valid result:\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(np.random.randn(20, 3), columns=[\"a\", \"b\", \"c\"])\n >>> df.loc[df.index[:5], \"a\"] = np.nan\n >>> df.loc[df.index[5:10], \"b\"] = np.nan\n >>> df.cov(min_periods=12)\n a b c\n a 0.316741 NaN -0.150812\n b NaN 1.248003 0.191417\n c -0.150812 0.191417 0.895202\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n if any(blk.dtype.kind in \"mM\" for blk in self._mgr.blocks):\n msg = (\n \"DataFrame contains columns with dtype datetime64 \"\n \"or timedelta64, which are not supported for cov.\"\n )\n raise TypeError(msg)\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if notna(mat).all():\n if min_periods is not None and min_periods > len(mat):\n base_cov = np.empty((mat.shape[1], mat.shape[1]))\n base_cov.fill(np.nan)\n else:\n base_cov = np.cov(mat.T, ddof=ddof)\n base_cov = base_cov.reshape((len(cols), len(cols)))\n else:\n base_cov = libalgos.nancorr(mat, cov=True, minp=min_periods)\n\n result = self._constructor(base_cov, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"cov\")\n\n def corrwith(\n self,\n other: DataFrame | Series,\n axis: Axis = 0,\n drop: bool = False,\n method: CorrelationMethod = \"pearson\",\n numeric_only: bool = False,\n min_periods: int | None = None,\n ) -> Series:\n \"\"\"\n Compute pairwise correlation.\n\n Pairwise correlation is computed between rows or columns of\n DataFrame with rows or columns of Series or DataFrame. DataFrames\n are first aligned along both axes before computing the\n correlations.\n\n Parameters\n ----------\n other : DataFrame, Series\n Object with which to compute correlations.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' to compute row-wise, 1 or 'columns' for\n column-wise.\n drop : bool, default False\n Drop missing indices from result.\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n min_periods : int, optional\n Minimum number of observations needed to have a valid result.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n Series\n Pairwise correlations.\n\n See Also\n --------\n DataFrame.corr : Compute pairwise correlation of columns.\n\n Examples\n --------\n >>> index = [\"a\", \"b\", \"c\", \"d\", \"e\"]\n >>> columns = [\"one\", \"two\", \"three\", \"four\"]\n >>> df1 = pd.DataFrame(\n ... np.arange(20).reshape(5, 4), index=index, columns=columns\n ... )\n >>> df2 = pd.DataFrame(\n ... np.arange(16).reshape(4, 4), index=index[:4], columns=columns\n ... )\n >>> df1.corrwith(df2)\n one 1.0\n two 1.0\n three 1.0\n four 1.0\n dtype: float64\n\n >>> df2.corrwith(df1, axis=1)\n a 1.0\n b 1.0\n c 1.0\n d 1.0\n e NaN\n dtype: float64\n \"\"\"\n axis = self._get_axis_number(axis)\n this = self._get_numeric_data() if numeric_only else self\n\n if isinstance(other, Series):\n return this.apply(\n lambda x: other.corr(x, method=method, min_periods=min_periods),\n axis=axis,\n )\n\n if numeric_only:\n other = other._get_numeric_data()\n left, right = this.align(other, join=\"inner\")\n\n if axis == 1:\n left = left.T\n right = right.T\n\n if method == \"pearson\":\n # mask missing values\n left = left + right * 0\n right = right + left * 0\n\n # demeaned data\n ldem = left - left.mean(numeric_only=numeric_only)\n rdem = right - right.mean(numeric_only=numeric_only)\n\n num = (ldem * rdem).sum()\n dom = (\n (left.count() - 1)\n * left.std(numeric_only=numeric_only)\n * right.std(numeric_only=numeric_only)\n )\n\n correl = num / dom\n\n elif method in [\"kendall\", \"spearman\"] or callable(method):\n\n def c(x):\n return nanops.nancorr(x[0], x[1], method=method)\n\n correl = self._constructor_sliced(\n map(c, zip(left.values.T, right.values.T, strict=True)),\n index=left.columns,\n copy=False,\n )\n\n else:\n raise ValueError(\n f\"Invalid method {method} was passed, \"\n \"valid methods are: 'pearson', 'kendall', \"\n \"'spearman', or callable\"\n )\n\n if not drop:\n # Find non-matching labels along the given axis\n # and append missing correlations (GH 22375)\n raxis: AxisInt = 1 if axis == 0 else 0\n result_index = this._get_axis(raxis).union(other._get_axis(raxis))\n idx_diff = result_index.difference(correl.index)\n\n if len(idx_diff) > 0:\n correl = correl._append_internal(\n Series([np.nan] * len(idx_diff), index=idx_diff)\n )\n\n return correl\n\n # ----------------------------------------------------------------------\n # ndarray-like stats methods\n\n def count(self, axis: Axis = 0, numeric_only: bool = False) -> Series:\n \"\"\"\n Count non-NA cells for each column or row.\n\n The values `None`, `NaN`, `NaT`, ``pandas.NA`` are considered NA.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index' counts are generated for each column.\n If 1 or 'columns' counts are generated for each row.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n For each column/row the number of non-NA/null entries.\n\n See Also\n --------\n Series.count: Number of non-NA elements in a Series.\n DataFrame.value_counts: Count unique combinations of columns.\n DataFrame.shape: Number of DataFrame rows and columns (including NA\n elements).\n DataFrame.isna: Boolean same-sized DataFrame showing places of NA\n elements.\n\n Examples\n --------\n Constructing DataFrame from a dictionary:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Person\": [\"John\", \"Myla\", \"Lewis\", \"John\", \"Myla\"],\n ... \"Age\": [24.0, np.nan, 21.0, 33, 26],\n ... \"Single\": [False, True, True, True, False],\n ... }\n ... )\n >>> df\n Person Age Single\n 0 John 24.0 False\n 1 Myla NaN True\n 2 Lewis 21.0 True\n 3 John 33.0 True\n 4 Myla 26.0 False\n\n Notice the uncounted NA values:\n\n >>> df.count()\n Person 5\n Age 4\n Single 5\n dtype: int64\n\n Counts for each **row**:\n\n >>> df.count(axis=\"columns\")\n 0 3\n 1 2\n 2 3\n 3 3\n 4 3\n dtype: int64\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if numeric_only:\n frame = self._get_numeric_data()\n else:\n frame = self\n\n # GH #423\n if len(frame._get_axis(axis)) == 0:\n result = self._constructor_sliced(0, index=frame._get_agg_axis(axis))\n else:\n result = notna(frame).sum(axis=axis)\n\n return result.astype(\"int64\").__finalize__(self, method=\"count\")\n\n def _reduce(\n self,\n op,\n name: str,\n *,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n filter_type=None,\n **kwds,\n ):\n assert filter_type is None or filter_type == \"bool\", filter_type\n out_dtype = \"bool\" if filter_type == \"bool\" else None\n\n if axis is not None:\n axis = self._get_axis_number(axis)\n\n def func(values: np.ndarray):\n # We only use this in the case that operates on self.values\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def blk_func(values, axis: Axis = 1):\n if isinstance(values, ExtensionArray):\n if not is_1d_only_ea_dtype(values.dtype):\n return values._reduce(name, axis=1, skipna=skipna, **kwds)\n return values._reduce(name, skipna=skipna, keepdims=True, **kwds)\n else:\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def _get_data() -> DataFrame:\n if filter_type is None:\n data = self._get_numeric_data()\n else:\n # GH#25101, GH#24434\n assert filter_type == \"bool\"\n data = self._get_bool_data()\n return data\n\n # Case with EAs see GH#35881\n df = self\n if numeric_only:\n df = _get_data()\n if axis is None:\n dtype = find_common_type([block.values.dtype for block in df._mgr.blocks])\n if isinstance(dtype, ExtensionDtype):\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n return arr._reduce(name, skipna=skipna, keepdims=False, **kwds)\n return maybe_unbox_numpy_scalar(func(df.values))\n elif axis == 1:\n if len(df.index) == 0:\n # Taking a transpose would result in no columns, losing the dtype.\n # In the empty case, reducing along axis 0 or 1 gives the same\n # result dtype, so reduce with axis=0 and ignore values\n result = df._reduce(\n op,\n name,\n axis=0,\n skipna=skipna,\n numeric_only=False,\n filter_type=filter_type,\n **kwds,\n ).iloc[:0]\n result.index = df.index\n return result\n\n if df.shape[1]:\n # GH#51474: block-wise axis=1 reduction avoiding expensive\n # transpose for numpy-backed and 2D EA blocks.\n if (\n name in (\"sum\", \"prod\", \"min\", \"max\", \"any\", \"all\", \"mean\")\n and len(df._mgr.blocks) > 1\n and all(\n (isinstance(bv, np.ndarray) and bv.dtype.kind != \"O\")\n or (\n isinstance(bv, ExtensionArray)\n and bv.ndim == 2\n and name in (\"min\", \"max\")\n and skipna\n )\n for bv in (block.values for block in df._mgr.blocks)\n )\n ):\n return df._reduce_axis1(\n name,\n op,\n skipna=skipna,\n min_count=kwds.get(\"min_count\", 0),\n )\n dtype = find_common_type(\n [block.values.dtype for block in df._mgr.blocks]\n )\n if isinstance(dtype, ExtensionDtype):\n # GH 54341: fastpath for EA-backed axis=1 reductions\n # This flattens the frame into a single 1D array while keeping\n # track of the row and column indices of the original frame. Once\n # flattened, grouping by the row indices and aggregating should\n # be equivalent to transposing the original frame and aggregating\n # with axis=0.\n name = {\"argmax\": \"idxmax\", \"argmin\": \"idxmin\"}.get(name, name)\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n nrows, ncols = df.shape\n row_index = np.tile(np.arange(nrows), ncols)\n col_index = np.repeat(np.arange(ncols), nrows)\n ser = Series(arr, index=col_index, copy=False)\n if name == \"all\":\n # Behavior here appears incorrect; preserving\n # for backwards compatibility for now.\n # See https://github.com/pandas-dev/pandas/issues/57171\n skipna = True\n result = ser.groupby(row_index).agg(name, **kwds, skipna=skipna)\n result.index = df.index\n return result\n\n df = df.T\n\n # After possibly _get_data and transposing, we are now in the\n # simple case where we can use BlockManager.reduce\n res = df._mgr.reduce(blk_func)\n out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]\n out.name = None\n if out_dtype is not None and out.dtype != \"boolean\":\n out = out.astype(out_dtype)\n elif (df._mgr.get_dtypes() == object).any() and name not in [\"any\", \"all\"]:\n out = out.astype(object)\n\n return out\n\n def _reduce_axis1(\n self, name: str, func, skipna: bool, min_count: int = 0\n ) -> Series:\n \"\"\"\n Special case for _reduce to try to avoid a potentially-expensive transpose.\n\n Apply the reduction block-wise along axis=1 and then reduce the resulting\n 1D arrays.\n \"\"\"\n if name == \"all\":\n result = np.ones(len(self), dtype=bool)\n ufunc = np.logical_and\n elif name == \"any\":\n result = np.zeros(len(self), dtype=bool)\n # error: Incompatible types in assignment\n # (expression has type \"_UFunc_Nin2_Nout1[Literal['logical_or'],\n # Literal[20], Literal[False]]\", variable has type\n # \"_UFunc_Nin2_Nout1[Literal['logical_and'], Literal[20],\n # Literal[True]]\")\n ufunc = np.logical_or # type: ignore[assignment]\n elif name in (\"sum\", \"mean\"):\n result = None\n ufunc = np.add # type: ignore[assignment]\n elif name == \"prod\":\n result = None\n ufunc = np.multiply # type: ignore[assignment]\n elif name == \"min\":\n result = None\n ufunc = np.fmin if skipna else np.minimum # type: ignore[assignment]\n elif name == \"max\":\n result = None\n ufunc = np.fmax if skipna else np.maximum # type: ignore[assignment]\n else:\n raise NotImplementedError(name)\n\n for block in self._mgr.blocks:\n vals = block.values\n if name in (\"min\", \"max\"):\n middle = ufunc.reduce(vals, axis=0) # type: ignore[arg-type]\n elif name == \"mean\":\n middle = nanops.nansum(vals, axis=0, skipna=skipna, min_count=0) # type: ignore[arg-type]\n elif name in (\"sum\", \"prod\"):\n # min_count=0 here so each block produces a result;\n # the actual min_count threshold is applied across\n # all blocks after the loop.\n middle = func(vals, axis=0, skipna=skipna, min_count=0)\n else:\n middle = func(vals, axis=0, skipna=skipna)\n if result is None:\n result = middle.copy()\n else:\n result = ufunc(result, middle)\n\n # Handle min_count for sum/prod, and compute mean from sum/count\n if name in (\"sum\", \"prod\", \"mean\"):\n if (min_count > 0 or name == \"mean\") and result is not None:\n non_null_count = np.zeros(len(self), dtype=np.intp)\n for block in self._mgr.blocks:\n vals = block.values\n if vals.dtype.kind in \"biu\":\n # bool/int/uint cannot have NaN\n non_null_count += vals.shape[0]\n else:\n non_null_count += vals.shape[0] - isna(vals).sum(axis=0)\n if name == \"mean\":\n null_mask = non_null_count == 0\n result = result.astype(\"float64\")\n result[~null_mask] /= non_null_count[~null_mask]\n result[null_mask] = np.nan\n else:\n null_mask = non_null_count < min_count\n if null_mask.any():\n if result.dtype.kind not in \"fc\":\n result = result.astype(\"float64\")\n result[null_mask] = np.nan\n\n assert result is not None\n res_ser = self._constructor_sliced(result, index=self.index, copy=False)\n return res_ser\n\n # error: Signature of \"any\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def any(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def any(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def any(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n def any(\n self,\n *,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether any element is True, potentially over an axis.\n\n Returns False unless there is at least one element within a series or\n along a Dataframe axis that is True or equivalent (e.g. non-zero or\n non-empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be False, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n numpy.any : Numpy version of this method.\n Series.any : Return whether any element is True.\n Series.all : Return whether all elements are True.\n DataFrame.any : Return whether any element is True over requested axis.\n DataFrame.all : Return whether all elements are True over requested axis.\n\n Examples\n --------\n **Series**\n\n For Series input, the output is a scalar indicating whether any element\n is True.\n\n >>> pd.Series([False, False]).any()\n False\n >>> pd.Series([True, False]).any()\n True\n >>> pd.Series([], dtype=\"float64\").any()\n False\n >>> pd.Series([np.nan]).any()\n False\n >>> pd.Series([np.nan]).any(skipna=False)\n True\n\n **DataFrame**\n\n Whether each column contains at least one True element (the default).\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0, 2], \"C\": [0, 0]})\n >>> df\n A B C\n 0 1 0 0\n 1 2 2 0\n\n >>> df.any()\n A True\n B True\n C False\n dtype: bool\n\n Aggregating over the columns.\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 2]})\n >>> df\n A B\n 0 True 1\n 1 False 2\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 True\n dtype: bool\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 0]})\n >>> df\n A B\n 0 True 1\n 1 False 0\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Aggregating over the entire DataFrame with ``axis=None``.\n\n >>> df.any(axis=None)\n True\n\n `any` for an empty DataFrame is an empty Series.\n\n >>> pd.DataFrame([]).any()\n Series([], dtype: bool)\n \"\"\"\n result = self._logical_func(\n \"any\", nanops.nanany, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"any\")\n return result\n\n @overload\n def all(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def all(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def all(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"all\")\n def all(\n self,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether all elements are True, potentially over an axis.\n\n Returns True unless there at least one element within a series or\n along a Dataframe axis that is False or equivalent (e.g. zero or\n empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be True, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n Series.all : Return True if all elements are True.\n DataFrame.any : Return True if one (or more) elements are True.\n\n Examples\n --------\n **Series**\n\n >>> pd.Series([True, True]).all()\n True\n >>> pd.Series([True, False]).all()\n False\n >>> pd.Series([], dtype=\"float64\").all()\n True\n >>> pd.Series([np.nan]).all()\n True\n >>> pd.Series([np.nan]).all(skipna=False)\n True\n\n **DataFrames**\n\n Create a DataFrame from a dictionary.\n\n >>> df = pd.DataFrame({\"col1\": [True, True], \"col2\": [True, False]})\n >>> df\n col1 col2\n 0 True True\n 1 True False\n\n Default behaviour checks if values in each column all return True.\n\n >>> df.all()\n col1 True\n col2 False\n dtype: bool\n\n Specify ``axis='columns'`` to check if values in each row all return True.\n\n >>> df.all(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Or ``axis=None`` for whether every value is True.\n\n >>> df.all(axis=None)\n False\n \"\"\"\n result = self._logical_func(\n \"all\", nanops.nanall, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"all\")\n return result\n\n # error: Signature of \"min\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def min(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def min(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def min(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"min\")\n def min(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the minimum of the values over the requested axis.\n\n If you want the *index* of the minimum, use ``idxmin``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmin``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.min()\n 0\n \"\"\"\n result = super().min(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"min\")\n return result\n\n # error: Signature of \"max\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def max(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def max(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def max(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"max\")\n def max(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the maximum of the values over the requested axis.\n\n If you want the *index* of the maximum, use ``idxmax``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmax``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.max()\n 8\n \"\"\"\n result = super().max(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"max\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sum\")\n def sum(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the sum of the values over the requested axis.\n\n This is equivalent to the method ``numpy.sum``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sum with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Sum over requested axis.\n\n See Also\n --------\n Series.sum : Return the sum over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.std : Return the standard deviation of the values over the\n requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.sum()\n 14\n\n By default, the sum of an empty or all-NA Series is ``0``.\n\n >>> pd.Series([], dtype=\"float64\").sum() # min_count=0 is the default\n 0.0\n\n This can be controlled with the ``min_count`` parameter. For example, if\n you'd like the sum of an empty series to be NaN, pass ``min_count=1``.\n\n >>> pd.Series([], dtype=\"float64\").sum(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).sum()\n 0.0\n\n >>> pd.Series([np.nan]).sum(min_count=1)\n nan\n \"\"\"\n result = super().sum(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sum\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"prod\")\n def prod(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the product of the values over the requested axis.\n\n This multiplies all values in each column (or row when\n ``axis=1``) together, skipping missing values by default.\n An empty or all-NA column returns ``1`` unless ``min_count``\n is specified.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.prod with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n The product of the values over the requested axis.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n By default, the product of an empty or all-NA Series is ``1``\n\n >>> pd.Series([], dtype=\"float64\").prod()\n 1.0\n\n This can be controlled with the ``min_count`` parameter\n\n >>> pd.Series([], dtype=\"float64\").prod(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).prod()\n 1.0\n\n >>> pd.Series([np.nan]).prod(min_count=1)\n nan\n \"\"\"\n result = super().prod(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"prod\")\n return result\n\n # error: Signature of \"mean\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def mean(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def mean(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def mean(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"mean\")\n def mean(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the mean of the values over the requested axis.\n\n This computes the arithmetic mean of the values in each column\n (or row when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.mean()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.mean()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.mean(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.mean(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().mean(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"mean\")\n return result\n\n # error: Signature of \"median\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def median(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def median(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def median(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\"], name=\"median\"\n )\n def median(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the median of the values over the requested axis.\n\n This computes the median of the values in each column (or row\n when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.median()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.median()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.median(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.median(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().median(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"median\")\n return result\n\n # error: Signature of \"sem\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sem(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def sem(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def sem(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sem\")\n def sem(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased standard error of the mean over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sem with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series\n Unbiased standard error of the mean over requested axis.\n\n See Also\n --------\n DataFrame.var : Return unbiased variance over requested axis.\n DataFrame.std : Returns sample standard deviation over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> round(s.sem(), 6)\n 0.57735\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.sem()\n a 0.5\n b 0.5\n dtype: float64\n\n Using axis=1\n\n >>> df.sem(axis=1)\n tiger 0.5\n zebra 0.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.sem(numeric_only=True)\n a 0.5\n dtype: float64\n \"\"\"\n result = super().sem(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sem\")\n return result\n\n # error: Signature of \"var\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def var(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def var(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def var(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"var\")\n def var(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased variance over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.var with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series or scalaer\n Unbiased variance over requested axis.\n\n See Also\n --------\n numpy.var : Equivalent function in NumPy.\n Series.var : Return unbiased variance over Series values.\n Series.std : Return standard deviation over Series values.\n DataFrame.std : Return standard deviation of the values over\n the requested axis.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n >>> df.var()\n age 352.916667\n height 0.056367\n dtype: float64\n\n Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:\n\n >>> df.var(ddof=0)\n age 264.687500\n height 0.042275\n dtype: float64\n \"\"\"\n result = super().var(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"var\")\n return result\n\n # error: Signature of \"std\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def std(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def std(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def std(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"std\")\n def std(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return sample standard deviation over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.std with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs : dict\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Standard deviation over requested axis.\n\n See Also\n --------\n Series.std : Return standard deviation over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.sum : Return the sum of the values over the requested axis.\n\n Notes\n -----\n To have the same behaviour as ``numpy.std``, use ``ddof=0`` (instead of\n the default ``ddof=1``) and ``skipna=False``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n The standard deviation of the columns can be found as follows:\n\n >>> df.std()\n age 18.786076\n height 0.237417\n dtype: float64\n\n Alternatively, `ddof=0` can be set to normalize by N instead of N-1:\n\n >>> df.std(ddof=0)\n age 16.269219\n height 0.205609\n dtype: float64\n \"\"\"\n result = super().std(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"std\")\n return result\n\n # error: Signature of \"skew\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def skew(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def skew(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def skew(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"skew\")\n def skew(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased skew over requested axis.\n\n Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased skew over requested axis.\n\n See Also\n --------\n DataFrame.kurt : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.skew()\n 0.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [2, 3, 4], \"c\": [1, 3, 5]},\n ... index=[\"tiger\", \"zebra\", \"cow\"],\n ... )\n >>> df\n a b c\n tiger 1 2 1\n zebra 2 3 3\n cow 3 4 5\n >>> df.skew()\n a 0.0\n b 0.0\n c 0.0\n dtype: float64\n\n Using axis=1\n\n >>> df.skew(axis=1)\n tiger 1.732051\n zebra -1.732051\n cow 0.000000\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [\"T\", \"Z\", \"X\"]}, index=[\"tiger\", \"zebra\", \"cow\"]\n ... )\n >>> df.skew(numeric_only=True)\n a 0.0\n dtype: float64\n \"\"\"\n result = super().skew(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"skew\")\n return result\n\n # error: Signature of \"kurt\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def kurt(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"kurt\")\n def kurt(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased kurtosis over requested axis.\n\n Kurtosis obtained using Fisher's definition of\n kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased kurtosis over requested axis.\n\n See Also\n --------\n DataFrame.kurtosis : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 2, 3], index=[\"cat\", \"dog\", \"dog\", \"mouse\"])\n >>> s\n cat 1\n dog 2\n dog 2\n mouse 3\n dtype: int64\n >>> round(s.kurt(), 6)\n 1.5\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 2, 3], \"b\": [3, 4, 4, 4]},\n ... index=[\"cat\", \"dog\", \"dog\", \"mouse\"],\n ... )\n >>> df\n a b\n cat 1 3\n dog 2 4\n dog 2 4\n mouse 3 4\n >>> round(df.kurt(), 6)\n a 1.5\n b 4.0\n dtype: float64\n\n With axis=None\n\n >>> round(df.kurt(axis=None), 6)\n -0.988693\n\n Using axis=1\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2], \"b\": [3, 4], \"c\": [3, 4], \"d\": [1, 2]},\n ... index=[\"cat\", \"dog\"],\n ... )\n >>> df.kurt(axis=1)\n cat -6.0\n dog -6.0\n dtype: float64\n \"\"\"\n result = super().kurt(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"kurt\")\n return result\n\n # error: Incompatible types in assignment\n kurtosis = kurt # type: ignore[assignment]\n product = prod\n\n def cummin(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative minimum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n minimum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative minimum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.min : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.min : Return the minimum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummin()\n 0 2.0\n 1 NaN\n 2 2.0\n 3 -1.0\n 4 -1.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummin(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the minimum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummin()\n A B\n 0 2.0 1.0\n 1 2.0 NaN\n 2 1.0 0.0\n\n To iterate over columns and find the minimum in each row,\n use ``axis=1``\n\n >>> df.cummin(axis=1)\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummin(data, axis, skipna, *args, **kwargs)\n\n def cummax(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative maximum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n maximum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative maximum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.max : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.max : Return the maximum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummax()\n 0 2.0\n 1 NaN\n 2 5.0\n 3 5.0\n 4 5.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummax(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the maximum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummax()\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 3.0 1.0\n\n To iterate over columns and find the maximum in each row,\n use ``axis=1``\n\n >>> df.cummax(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummax(data, axis, skipna, *args, **kwargs)\n\n def cumsum(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative sum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n sum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative sum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.sum : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.sum : Return the sum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumsum()\n 0 2.0\n 1 NaN\n 2 7.0\n 3 6.0\n 4 6.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumsum(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the sum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumsum()\n A B\n 0 2.0 1.0\n 1 5.0 NaN\n 2 6.0 1.0\n\n To iterate over columns and find the sum in each row,\n use ``axis=1``\n\n >>> df.cumsum(axis=1)\n A B\n 0 2.0 3.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumsum(data, axis, skipna, *args, **kwargs)\n\n def cumprod(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative product over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n product.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative product of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.prod : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.prod : Return the product over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumprod()\n 0 2.0\n 1 NaN\n 2 10.0\n 3 -10.0\n 4 -0.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumprod(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the product\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumprod()\n A B\n 0 2.0 1.0\n 1 6.0 NaN\n 2 6.0 0.0\n\n To iterate over columns and find the product in each row,\n use ``axis=1``\n\n >>> df.cumprod(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumprod(data, axis, skipna, *args, **kwargs)\n\n def nunique(self, axis: Axis = 0, dropna: bool = True) -> Series:\n \"\"\"\n Count number of distinct elements in specified axis.\n\n Return Series with number of distinct elements. Can ignore NaN\n values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for\n column-wise.\n dropna : bool, default True\n Don't include NaN in the counts.\n\n Returns\n -------\n Series\n Series with counts of unique values per row or column, depending on `axis`.\n\n See Also\n --------\n Series.nunique: Method nunique for Series.\n DataFrame.count: Count non-NA cells for each column or row.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [4, 5, 6], \"B\": [4, 1, 1]})\n >>> df.nunique()\n A 3\n B 2\n dtype: int64\n\n >>> df.nunique(axis=1)\n 0 1\n 1 2\n 2 2\n dtype: int64\n \"\"\"\n return self.apply(Series.nunique, axis=axis, dropna=dropna)\n\n def idxmin(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of minimum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of minima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmin : Return index of the minimum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmin``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the minimum value in each column.\n\n >>> df.idxmin()\n consumption Pork\n co2_emissions Wheat Products\n dtype: str\n\n To return the index for the minimum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmin(axis=\"columns\")\n Pork consumption\n Wheat Products co2_emissions\n Beef consumption\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmin, \"argmin\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be np.ndarray since axis is not N\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmin\")\n\n def idxmax(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of maximum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of maxima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmax : Return index of the maximum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmax``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the maximum value in each column.\n\n >>> df.idxmax()\n consumption Wheat Products\n co2_emissions Beef\n dtype: str\n\n To return the index for the maximum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmax(axis=\"columns\")\n Pork co2_emissions\n Wheat Products consumption\n Beef co2_emissions\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmax, \"argmax\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be 1d array since axis is not None\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmax\")\n\n def _get_agg_axis(self, axis_num: int) -> Index:\n \"\"\"\n Let's be explicit about this.\n \"\"\"\n if axis_num == 0:\n return self.columns\n elif axis_num == 1:\n return self.index\n else:\n raise ValueError(f\"Axis must be 0 or 1 (got {axis_num!r})\")\n\n def mode(\n self, axis: Axis = 0, numeric_only: bool = False, dropna: bool = True\n ) -> DataFrame:\n \"\"\"\n Get the mode(s) of each element along the selected axis.\n\n The mode of a set of values is the value that appears most often.\n It can be multiple values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to iterate over while searching for the mode:\n\n * 0 or 'index' : get mode of each column\n * 1 or 'columns' : get mode of each row.\n\n numeric_only : bool, default False\n If True, only apply to numeric columns.\n dropna : bool, default True\n Don't consider counts of NaN/NaT.\n\n Returns\n -------\n DataFrame\n The modes of each column or row.\n\n See Also\n --------\n Series.mode : Return the highest frequency value in a Series.\n Series.value_counts : Return the counts of values in a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"bird\", 2, 2),\n ... (\"mammal\", 4, np.nan),\n ... (\"arthropod\", 8, 0),\n ... (\"bird\", 2, np.nan),\n ... ],\n ... index=(\"falcon\", \"horse\", \"spider\", \"ostrich\"),\n ... columns=(\"species\", \"legs\", \"wings\"),\n ... )\n >>> df\n species legs wings\n falcon bird 2 2.0\n horse mammal 4 NaN\n spider arthropod 8 0.0\n ostrich bird 2 NaN\n\n By default, missing values are not considered, and the mode of wings\n are both 0 and 2. Because the resulting DataFrame has two rows,\n the second row of ``species`` and ``legs`` contains ``NaN``.\n\n >>> df.mode()\n species legs wings\n 0 bird 2.0 0.0\n 1 NaN NaN 2.0\n\n Setting ``dropna=False`` ``NaN`` values are considered and they can be\n the mode (like for wings).\n\n >>> df.mode(dropna=False)\n species legs wings\n 0 bird 2 NaN\n\n Setting ``numeric_only=True``, only the mode of numeric columns is\n computed, and columns of other types are ignored.\n\n >>> df.mode(numeric_only=True)\n legs wings\n 0 2.0 0.0\n 1 NaN 2.0\n\n To compute the mode over columns and not rows, use the axis parameter:\n\n >>> df.mode(axis=\"columns\", numeric_only=True)\n 0 1\n falcon 2.0 NaN\n horse 4.0 NaN\n spider 0.0 8.0\n ostrich 2.0 NaN\n \"\"\"\n data = self if not numeric_only else self._get_numeric_data()\n\n def f(s):\n return s.mode(dropna=dropna)\n\n data = data.apply(f, axis=axis)\n # Ensure index is type stable (should always use int index)\n if data.empty:\n data.index = default_index(0)\n\n return data\n\n @overload\n def quantile(\n self,\n q: float = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series: ...\n\n @overload\n def quantile(\n self,\n q: AnyArrayLike | Sequence[float],\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n @overload\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = 0.5,\n axis: Axis = 0,\n numeric_only: bool = False,\n interpolation: QuantileInterpolation = \"linear\",\n method: Literal[\"single\", \"table\"] = \"single\",\n ) -> Series | DataFrame:\n \"\"\"\n Return values at the given quantile over requested axis.\n\n This method computes the value below which a given proportion of\n observations fall. By default, it computes quantiles column-wise,\n but row-wise computation is also supported via ``axis=1``.\n\n Parameters\n ----------\n q : float or array-like, default 0.5 (50% quantile)\n Value between 0 <= q <= 1, the quantile(s) to compute.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Equals 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}\n This optional parameter specifies the interpolation method to use,\n when the desired quantile lies between two data points `i` and `j`:\n\n * linear: `i + (j - i) * fraction`, where `fraction` is the\n fractional part of the index surrounded by `i` and `j`.\n * lower: `i`.\n * higher: `j`.\n * nearest: `i` or `j` whichever is nearest.\n * midpoint: (`i` + `j`) / 2.\n method : {'single', 'table'}, default 'single'\n Whether to compute quantiles per-column ('single') or over all columns\n ('table'). When 'table', the only allowed interpolation methods are\n 'nearest', 'lower', and 'higher'.\n\n Returns\n -------\n Series or DataFrame\n\n If ``q`` is an array, a DataFrame will be returned where the\n index is ``q``, the columns are the columns of self, and the\n values are the quantiles.\n If ``q`` is a float, a Series will be returned where the\n index is the columns of self and the values are the quantiles.\n\n See Also\n --------\n core.window.rolling.Rolling.quantile: Rolling quantile.\n numpy.percentile: Numpy function to compute the percentile.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=[\"a\", \"b\"]\n ... )\n >>> df.quantile(0.1)\n a 1.3\n b 3.7\n Name: 0.1, dtype: float64\n >>> df.quantile([0.1, 0.5])\n a b\n 0.1 1.3 3.7\n 0.5 2.5 55.0\n\n Specifying `method='table'` will compute the quantile over all columns.\n\n >>> df.quantile(0.1, method=\"table\", interpolation=\"nearest\")\n a 1\n b 1\n Name: 0.1, dtype: int64\n >>> df.quantile([0.1, 0.5], method=\"table\", interpolation=\"nearest\")\n a b\n 0.1 1 1\n 0.5 3 100\n\n Specifying `numeric_only=False` will compute the quantiles for all\n columns.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [1, 2],\n ... \"B\": [pd.Timestamp(\"2010\"), pd.Timestamp(\"2011\")],\n ... \"C\": [pd.Timedelta(\"1 days\"), pd.Timedelta(\"2 days\")],\n ... }\n ... )\n >>> df.quantile(0.5, numeric_only=False)\n A 1.5\n B 2010-07-02 12:00:00\n C 1 days 12:00:00\n Name: 0.5, dtype: object\n \"\"\"\n validate_percentile(q)\n axis = self._get_axis_number(axis)\n\n if not is_list_like(q):\n # BlockManager.quantile expects listlike, so we wrap and unwrap here\n # error: List item 0 has incompatible type \"float | ExtensionArray |\n # ndarray[Any, Any] | Index | Series | Sequence[float]\"; expected \"float\"\n res_df = self.quantile(\n [q], # type: ignore[list-item]\n axis=axis,\n numeric_only=numeric_only,\n interpolation=interpolation,\n method=method,\n )\n if method == \"single\":\n res = res_df.iloc[0]\n else:\n # cannot directly iloc over sparse arrays\n res = res_df.T.iloc[:, 0]\n if axis == 1 and len(self) == 0:\n # GH#41544 try to get an appropriate dtype\n dtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(dtype):\n return res.astype(dtype)\n return res\n\n q = Index(q, dtype=np.float64)\n data = self._get_numeric_data() if numeric_only else self\n\n if axis == 1:\n data = data.T\n\n if len(data.columns) == 0:\n # GH#23925 _get_numeric_data may have dropped all columns\n cols = self.columns[:0]\n\n dtype = np.float64\n if axis == 1:\n # GH#41544 try to get an appropriate dtype\n cdtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(cdtype):\n dtype = cdtype\n\n res = self._constructor([], index=q, columns=cols, dtype=dtype)\n return res.__finalize__(self, method=\"quantile\")\n\n valid_method = {\"single\", \"table\"}\n if method not in valid_method:\n raise ValueError(\n f\"Invalid method: {method}. Method must be in {valid_method}.\"\n )\n if method == \"single\":\n res = data._mgr.quantile(qs=q, interpolation=interpolation)\n elif method == \"table\":\n valid_interpolation = {\"nearest\", \"lower\", \"higher\"}\n if interpolation not in valid_interpolation:\n raise ValueError(\n f\"Invalid interpolation: {interpolation}. \"\n f\"Interpolation must be in {valid_interpolation}\"\n )\n # handle degenerate case\n if len(data) == 0:\n if data.ndim == 2:\n dtype = find_common_type(list(self.dtypes))\n else:\n dtype = self.dtype\n return self._constructor([], index=q, columns=data.columns, dtype=dtype)\n\n q_idx = np.quantile(np.arange(len(data)), q, method=interpolation)\n\n by = data.columns\n if len(by) > 1:\n keys = [data._get_label_or_level_values(x) for x in by]\n indexer = lexsort_indexer(keys)\n else:\n k = data._get_label_or_level_values(by[0])\n indexer = nargsort(k)\n\n res = data._mgr.take(indexer[q_idx], verify=False)\n res.axes[1] = q\n\n result = self._constructor_from_mgr(res, axes=res.axes)\n return result.__finalize__(self, method=\"quantile\")\n\n def to_timestamp(\n self,\n freq: Frequency | None = None,\n how: ToTimestampHow = \"start\",\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Cast PeriodIndex to DatetimeIndex of timestamps, at *beginning* of period.\n\n This can be changed to the *end* of the period, by specifying `how=\"e\"`.\n\n Parameters\n ----------\n freq : str, default frequency of PeriodIndex\n Desired frequency.\n how : {'s', 'e', 'start', 'end'}\n Convention for converting period to timestamp; start of period\n vs. end.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame with DatetimeIndex\n DataFrame with the PeriodIndex cast to DatetimeIndex.\n\n See Also\n --------\n DataFrame.to_period: Inverse method to cast DatetimeIndex to PeriodIndex.\n Series.to_timestamp: Equivalent method for Series.\n\n Examples\n --------\n >>> idx = pd.PeriodIndex([\"2023\", \"2024\"], freq=\"Y\")\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d, index=idx)\n >>> df1\n col1 col2\n 2023 1 3\n 2024\t 2 4\n\n The resulting timestamps will be at the beginning of the year in this case\n\n >>> df1 = df1.to_timestamp()\n >>> df1\n col1 col2\n 2023-01-01 1 3\n 2024-01-01 2 4\n >>> df1.index\n DatetimeIndex(['2023-01-01', '2024-01-01'], dtype='datetime64[us]', freq=None)\n\n Using `freq` which is the offset that the Timestamps will have\n\n >>> df2 = pd.DataFrame(data=d, index=idx)\n >>> df2 = df2.to_timestamp(freq=\"M\")\n >>> df2\n col1 col2\n 2023-01-31 1 3\n 2024-01-31 2 4\n >>> df2.index\n DatetimeIndex(['2023-01-31', '2024-01-31'], dtype='datetime64[us]', freq=None)\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, PeriodIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_timestamp(freq=freq, how=how)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def to_period(\n self,\n freq: Frequency | None = None,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Convert DataFrame from DatetimeIndex to PeriodIndex.\n\n Convert DataFrame from DatetimeIndex to PeriodIndex with desired\n frequency (inferred from index if not passed). Either index of columns can be\n converted, depending on `axis` argument.\n\n Parameters\n ----------\n freq : str, default\n Frequency of the PeriodIndex.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The DataFrame with the converted PeriodIndex.\n\n See Also\n --------\n Series.to_period: Equivalent method for Series.\n Series.dt.to_period: Convert DateTime column values.\n\n Examples\n --------\n >>> idx = pd.to_datetime(\n ... [\n ... \"2001-03-31 00:00:00\",\n ... \"2002-05-31 00:00:00\",\n ... \"2003-08-31 00:00:00\",\n ... ]\n ... )\n\n >>> idx\n DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],\n dtype='datetime64[us]', freq=None)\n\n >>> idx.to_period(\"M\")\n PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')\n\n For the yearly frequency\n\n >>> idx.to_period(\"Y\")\n PeriodIndex(['2001', '2002', '2003'], dtype='period[Y-DEC]')\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, DatetimeIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_period(freq=freq)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def isin(self, values: Series | DataFrame | Sequence | Mapping) -> DataFrame:\n \"\"\"\n Whether each element in the DataFrame is contained in values.\n\n Returns a DataFrame of the same shape with boolean values: True\n where the element is in the corresponding structure of\n ``values``, False otherwise. ``values`` can be a list, dict,\n Series, or DataFrame; alignment rules depend on its type.\n\n Parameters\n ----------\n values : iterable, Series, DataFrame or dict\n The result will only be true at a location if all the\n labels match. If `values` is a Series, that's the index. If\n `values` is a dict, the keys must be the column names,\n which must match. If `values` is a DataFrame,\n then both the index and column labels must match.\n\n Returns\n -------\n DataFrame\n DataFrame of booleans showing whether each element in the DataFrame\n is contained in values.\n\n See Also\n --------\n DataFrame.eq: Equality test for DataFrame.\n Series.isin: Equivalent method on Series.\n Series.str.contains: Test if pattern or regex is contained within a\n string of a Series or Index.\n\n Notes\n -----\n ``__iter__`` is used (and not ``__contains__``) to iterate over values\n when checking if it contains the elements in DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4], \"num_wings\": [2, 0]}, index=[\"falcon\", \"dog\"]\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n\n When ``values`` is a list check whether every value in the DataFrame\n is present in the list (which animals have 0 or 2 legs or wings)\n\n >>> df.isin([0, 2])\n num_legs num_wings\n falcon True True\n dog False True\n\n To check if ``values`` is *not* in the DataFrame, use the ``~`` operator:\n\n >>> ~df.isin([0, 2])\n num_legs num_wings\n falcon False False\n dog True False\n\n When ``values`` is a dict, we can pass values to check for each\n column separately:\n\n >>> df.isin({\"num_wings\": [0, 3]})\n num_legs num_wings\n falcon False False\n dog False True\n\n When ``values`` is a Series or DataFrame the index and column must\n match. Note that 'falcon' does not match based on the number of legs\n in other.\n\n >>> other = pd.DataFrame(\n ... {\"num_legs\": [8, 3], \"num_wings\": [0, 2]}, index=[\"spider\", \"falcon\"]\n ... )\n >>> df.isin(other)\n num_legs num_wings\n falcon False True\n dog False False\n \"\"\"\n if isinstance(values, dict):\n from pandas.core.reshape.concat import concat\n\n values = collections.defaultdict(list, values)\n result = concat(\n (\n self.iloc[:, [i]].isin(values[col])\n for i, col in enumerate(self.columns)\n ),\n axis=1,\n )\n elif isinstance(values, Series):\n if not values.index.is_unique:\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self), axis=\"index\")\n elif isinstance(values, DataFrame):\n if not (values.columns.is_unique and values.index.is_unique):\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self))\n else:\n if not is_list_like(values):\n raise TypeError(\n \"only list-like or dict-like objects are allowed \"\n \"to be passed to DataFrame.isin(), \"\n f\"you passed a '{type(values).__name__}'\"\n )\n\n def isin_(x):\n # error: Argument 2 to \"isin\" has incompatible type \"Union[Series,\n # DataFrame, Sequence[Any], Mapping[Any, Any]]\"; expected\n # \"Union[Union[Union[ExtensionArray, ndarray[Any, Any]], Index,\n # Series], List[Any], range]\"\n result = algorithms.isin(\n x.ravel(),\n values, # type: ignore[arg-type]\n )\n return result.reshape(x.shape)\n\n res_mgr = self._mgr.apply(isin_)\n result = self._constructor_from_mgr(\n res_mgr,\n axes=res_mgr.axes,\n )\n return result.__finalize__(self, method=\"isin\")\n\n # ----------------------------------------------------------------------\n # Add index and columns\n _AXIS_ORDERS: list[Literal[\"index\", \"columns\"]] = [\"index\", \"columns\"]\n _AXIS_TO_AXIS_NUMBER: dict[Axis, int] = {\n **NDFrame._AXIS_TO_AXIS_NUMBER,\n 1: 1,\n \"columns\": 1,\n }\n _AXIS_LEN = len(_AXIS_ORDERS)\n _info_axis_number: Literal[1] = 1\n _info_axis_name: Literal[\"columns\"] = \"columns\"\n\n index = properties.AxisProperty(\n axis=1,\n doc=\"\"\"\n The index (row labels) of the DataFrame.\n\n The index of a DataFrame is a series of labels that identify each row.\n The labels can be integers, strings, or any other hashable type. The index\n is used for label-based access and alignment, and can be accessed or\n modified using this attribute.\n\n Returns\n -------\n pandas.Index\n The index labels of the DataFrame.\n\n See Also\n --------\n DataFrame.columns : The column labels of the DataFrame.\n DataFrame.to_numpy : Convert the DataFrame to a NumPy array.\n\n Examples\n --------\n >>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],\n ... 'Age': [25, 30, 35],\n ... 'Location': ['Seattle', 'New York', 'Kona']},\n ... index=([10, 20, 30]))\n >>> df.index\n Index([10, 20, 30], dtype='int64')\n\n In this example, we create a DataFrame with 3 rows and 3 columns,\n including Name, Age, and Location information. We set the index labels to\n be the integers 10, 20, and 30. We then access the `index` attribute of the\n DataFrame, which returns an `Index` object containing the index labels.\n\n >>> df.index = [100, 200, 300]\n >>> df\n Name Age Location\n 100 Alice 25 Seattle\n 200 Bob 30 New York\n 300 Aritra 35 Kona\n\n In this example, we modify the index labels of the DataFrame by assigning\n a new list of labels to the `index` attribute. The DataFrame is then\n updated with the new labels, and the output shows the modified DataFrame.\n \"\"\",\n )\n columns = properties.AxisProperty(\n axis=0,\n doc=\"\"\"\n The column labels of the DataFrame.\n\n This property holds the column names as a pandas ``Index`` object.\n It provides an immutable sequence of column labels that can be\n used for data selection, renaming, and alignment in DataFrame operations.\n\n Returns\n -------\n pandas.Index\n The column labels of the DataFrame.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.axes: Return a list representing the axes of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})\n >>> df\n A B\n 0 1 3\n 1 2 4\n >>> df.columns\n Index(['A', 'B'], dtype='str')\n \"\"\",\n )\n\n # ----------------------------------------------------------------------\n # Add plotting methods to DataFrame\n plot = Accessor(\"plot\", pandas.plotting.PlotAccessor)\n hist = pandas.plotting.hist_frame\n boxplot = pandas.plotting.boxplot_frame\n sparse = Accessor(\"sparse\", SparseFrameAccessor)\n\n # ----------------------------------------------------------------------\n # Internal Interface Methods\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\n @property\n def values(self) -> np.ndarray:\n \"\"\"\n Return a Numpy representation of the DataFrame.\n\n .. warning::\n\n We recommend using :meth:`DataFrame.to_numpy` instead.\n ``.values`` offers no way to control the output ``dtype``, copy\n semantics, or the value used to fill missing entries, while\n :meth:`DataFrame.to_numpy` exposes those as the ``dtype``,\n ``copy``, and ``na_value`` arguments. The mutability of the\n result also depends on the DataFrame's internal block layout:\n when the DataFrame is backed by a single block the result is a\n read-only view (writes raise); when there are multiple blocks\n the result is a writable copy whose mutations do not propagate\n back to the DataFrame.\n\n Only the values in the DataFrame will be returned, the axes labels\n will be removed.\n\n Returns\n -------\n numpy.ndarray\n The values of the DataFrame.\n\n See Also\n --------\n DataFrame.to_numpy : Recommended alternative to this method.\n DataFrame.index : Retrieve the index labels.\n DataFrame.columns : Retrieving the column names.\n\n Notes\n -----\n The returned array is not intended to be written to. When the\n DataFrame is backed by a single NumPy array (single dtype, single\n block), the result is a read-only view; when the DataFrame has\n multiple internal blocks (e.g. after adding a new column), the\n result is a copy and modifications to it will not be reflected in\n the original DataFrame. Use :meth:`DataFrame.to_numpy` for more\n explicit control over copy behavior, or use :attr:`DataFrame.iloc`\n to modify values in-place.\n\n The dtype will be a lower-common-denominator dtype (implicit\n upcasting); that is to say if the dtypes (even of numeric types)\n are mixed, the one that accommodates all will be chosen. Use this\n with care if you are not dealing with the blocks.\n\n e.g. If the dtypes are float16 and float32, dtype will be upcast to\n float32. If dtypes are int32 and uint8, dtype will be upcast to\n int32. By :func:`numpy.find_common_type` convention, mixing int64\n and uint64 will result in a float64 dtype.\n\n Examples\n --------\n A DataFrame where all columns are the same type (e.g., int64) results\n in an array of the same type.\n\n >>> df = pd.DataFrame(\n ... {\"age\": [3, 29], \"height\": [94, 170], \"weight\": [31, 115]}\n ... )\n >>> df\n age height weight\n 0 3 94 31\n 1 29 170 115\n >>> df.dtypes\n age int64\n height int64\n weight int64\n dtype: object\n >>> df.values\n array([[ 3, 94, 31],\n [ 29, 170, 115]])\n\n A DataFrame with mixed type columns(e.g., str/object, int64, float32)\n results in an ndarray of the broadest type that accommodates these\n mixed types (e.g., object).\n\n >>> df2 = pd.DataFrame(\n ... [\n ... (\"parrot\", 24.0, \"second\"),\n ... (\"lion\", 80.5, 1),\n ... (\"monkey\", np.nan, None),\n ... ],\n ... columns=(\"name\", \"max_speed\", \"rank\"),\n ... )\n >>> df2.dtypes\n name str\n max_speed float64\n rank object\n dtype: object\n >>> df2.values\n array([['parrot', 24.0, 'second'],\n ['lion', 80.5, 1],\n ['monkey', nan, None]], dtype=object)\n\n ``DataFrame.to_numpy`` produces the same array by default, but lets\n you choose how missing values are represented and request a\n guaranteed copy:\n\n >>> df3 = pd.DataFrame({\"a\": [1, 2], \"b\": [3.0, np.nan]})\n >>> df3.values\n array([[ 1., 3.],\n [ 2., nan]])\n >>> df3.to_numpy(na_value=-1)\n array([[ 1., 3.],\n [ 2., -1.]])\n >>> df3.to_numpy(dtype=\"float32\", copy=True)\n array([[ 1., 3.],\n [ 2., nan]], dtype=float32)\n \"\"\"\n return self._mgr.as_array()\n\n\ndef _from_nested_dict(\n data: Mapping[HashableT, Mapping[HashableT2, T]],\n) -> collections.defaultdict[HashableT2, dict[HashableT, T]]:\n new_data: collections.defaultdict[HashableT2, dict[HashableT, T]] = (\n collections.defaultdict(dict)\n )\n for index, s in data.items():\n for col, v in s.items():\n new_data[col][index] = v\n return new_data\n\n\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n```\n
","tags":["context-window","needle-retrieval","python","front","256k"],"expected_answer":"IH_NEEDLE_256K_FRONT","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":256000,"needle_position":"front","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-middle-256k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-middle-256k\nApproximate target context: 256000 tokens; needle position: middle.\nFind the Python benchmark needle for needle-middle-256k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n# InferHarness context needle: needle-middle-256k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_256K_MIDDLE\"\n# End InferHarness context needle\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n # test_append_empty_frame_to_series_with_dateutil_tz\n row_df = row_df.infer_objects().rename_axis(index.names)\n\n if len(row_df.columns) == len(self.columns):\n # Pre-cast the row's value to the original column dtype where the\n # row's inferred dtype would otherwise force concat to widen the\n # whole column. This avoids an O(N) materialize-and-rebuild\n # roundtrip in _post_expansion_casting, and (for EA dtypes that\n # carry array-level state not encoded in the dtype, e.g. geopandas\n # CRS) preserves that state through concat. GH#65094.\n orig_dtypes = self._mgr.get_dtypes()\n row_dtypes = row_df._mgr.get_dtypes()\n object_dtype = np.dtype(object)\n for i in range(len(self.columns)):\n orig_dtype = orig_dtypes[i]\n if row_dtypes[i] == orig_dtype:\n continue\n if orig_dtype == object_dtype:\n # concat object + anything stays object; post-cast is a\n # no-op, so pre-casting would only add overhead.\n continue\n arr = self._get_column_array(i)\n if isinstance(arr, np.ndarray):\n # infer_and_maybe_downcast expects an EA as its first\n # argument so it can dispatch to _cast_pointwise_result.\n arr = NumpyExtensionArray(arr)\n casted = infer_and_maybe_downcast(arr, row_df._mgr.iget_values(i))\n row_df.isetitem(i, casted)\n\n from pandas.core.reshape.concat import concat\n\n result = concat(\n [self, row_df],\n ignore_index=ignore_index,\n )\n return result.__finalize__(self, method=\"append\")\n\n def join(\n self,\n other: DataFrame | Series | Iterable[DataFrame | Series],\n on: IndexLabel | None = None,\n how: MergeHow = \"left\",\n lsuffix: str = \"\",\n rsuffix: str = \"\",\n sort: bool = False,\n validate: JoinValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Join columns of another DataFrame.\n\n Join columns with `other` DataFrame either on index or on a key\n column. Efficiently join multiple DataFrame objects by index at once by\n passing a list.\n\n Parameters\n ----------\n other : DataFrame, Series, or a list containing any combination of them\n Index should be similar to one of the columns in the caller. If a\n Series is passed, its name attribute must be set, and that will be\n used as the column name in the resulting joined DataFrame.\n on : str, list of str, or array-like, optional\n Column or index level name(s) in the caller to join on the index\n in `other`, otherwise joins index-on-index. If multiple\n values given, the `other` DataFrame must have a MultiIndex. Can\n pass an array as the join key if it is not already contained in\n the calling DataFrame. Like an Excel VLOOKUP operation.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'left'\n How to handle the operation of the two objects.\n\n * left: use calling frame's index (or column if on is specified)\n * right: use `other`'s index.\n * outer: form union of calling frame's index (or column if on is\n specified) with `other`'s index, and sort it lexicographically.\n * inner: form intersection of calling frame's index (or column if\n on is specified) with `other`'s index, preserving the order\n of the calling's one.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use set difference of calling frame's index and `other`'s\n index.\n * right_anti: use set difference of `other`'s index and calling frame's\n index.\n lsuffix : str, default ''\n Suffix to use from left frame's overlapping columns.\n rsuffix : str, default ''\n Suffix to use from right frame's overlapping columns.\n sort : bool, default False\n Order result DataFrame lexicographically by the join key. If False,\n the order of the join key depends on the join type (how keyword).\n validate : str, optional\n If specified, checks if join is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if join keys are unique in both left\n and right datasets.\n * \"one_to_many\" or \"1:m\": check if join keys are unique in left dataset.\n * \"many_to_one\" or \"m:1\": check if join keys are unique in right dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A dataframe containing columns from both the caller and `other`.\n\n See Also\n --------\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n Parameters `on`, `lsuffix`, and `rsuffix` are not supported when\n passing a list of `DataFrame` objects.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K2\", \"K3\", \"K4\", \"K5\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K2 A2\n 3 K3 A3\n 4 K4 A4\n 5 K5 A5\n\n >>> other = pd.DataFrame({\"key\": [\"K0\", \"K1\", \"K2\"], \"B\": [\"B0\", \"B1\", \"B2\"]})\n\n >>> other\n key B\n 0 K0 B0\n 1 K1 B1\n 2 K2 B2\n\n Join DataFrames using their indexes.\n\n >>> df.join(other, lsuffix=\"_caller\", rsuffix=\"_other\")\n key_caller A key_other B\n 0 K0 A0 K0 B0\n 1 K1 A1 K1 B1\n 2 K2 A2 K2 B2\n 3 K3 A3 NaN NaN\n 4 K4 A4 NaN NaN\n 5 K5 A5 NaN NaN\n\n If we want to join using the key columns, we need to set key to be\n the index in both `df` and `other`. The joined DataFrame will have\n key as its index.\n\n >>> df.set_index(\"key\").join(other.set_index(\"key\"))\n A B\n key\n K0 A0 B0\n K1 A1 B1\n K2 A2 B2\n K3 A3 NaN\n K4 A4 NaN\n K5 A5 NaN\n\n Another option to join using the key columns is to use the `on`\n parameter. DataFrame.join always uses `other`'s index but we can use\n any column in `df`. This method preserves the original DataFrame's\n index in the result.\n\n >>> df.join(other.set_index(\"key\"), on=\"key\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K2 A2 B2\n 3 K3 A3 NaN\n 4 K4 A4 NaN\n 5 K5 A5 NaN\n\n Using non-unique key values shows how they are matched.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K1\", \"K3\", \"K0\", \"K1\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K1 A2\n 3 K3 A3\n 4 K0 A4\n 5 K1 A5\n\n >>> df.join(other.set_index(\"key\"), on=\"key\", validate=\"m:1\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K1 A2 B1\n 3 K3 A3 NaN\n 4 K0 A4 B0\n 5 K1 A5 B1\n \"\"\"\n from pandas.core.reshape.concat import concat\n from pandas.core.reshape.merge import merge\n\n if isinstance(other, Series):\n if other.name is None:\n raise ValueError(\"Other Series must have a name\")\n other = DataFrame({other.name: other})\n\n if isinstance(other, DataFrame):\n if how == \"cross\":\n return merge(\n self,\n other,\n how=how,\n on=on,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n return merge(\n self,\n other,\n left_on=on,\n how=how,\n left_index=on is None,\n right_index=True,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n else:\n if on is not None:\n raise ValueError(\n \"Joining multiple DataFrames only supported for joining on index\"\n )\n\n if rsuffix or lsuffix:\n raise ValueError(\n \"Suffixes not supported when joining multiple DataFrames\"\n )\n\n # Mypy thinks the RHS is a\n # \"Union[DataFrame, Series, Iterable[Union[DataFrame, Series]]]\" whereas\n # the LHS is an \"Iterable[DataFrame]\", but in reality both types are\n # \"Iterable[Union[DataFrame, Series]]\" due to the if statements\n frames = [cast(\"DataFrame | Series\", self), *list(other)]\n\n can_concat = all(df.index.is_unique for df in frames)\n\n # join indexes only using concat\n if can_concat:\n if how in {\"left\", \"right\"}:\n res = concat(\n frames, axis=1, join=\"outer\", verify_integrity=True, sort=sort\n )\n index = self.index if how == \"left\" else frames[-1].index\n if sort:\n index = index.sort_values()\n result = res.reindex(index)\n return result\n else:\n if how == \"outer\":\n sort = True\n return concat(\n frames, axis=1, join=how, verify_integrity=True, sort=sort\n )\n\n joined = frames[0]\n\n for frame in frames[1:]:\n joined = merge(\n joined,\n frame,\n sort=sort,\n how=how,\n left_index=True,\n right_index=True,\n validate=validate,\n )\n\n return joined\n\n def merge(\n self,\n right: DataFrame | Series,\n how: MergeHow = \"inner\",\n on: IndexLabel | AnyArrayLike | None = None,\n left_on: IndexLabel | AnyArrayLike | None = None,\n right_on: IndexLabel | AnyArrayLike | None = None,\n left_index: bool = False,\n right_index: bool = False,\n sort: bool = False,\n suffixes: Suffixes = (\"_x\", \"_y\"),\n copy: bool | lib.NoDefault = lib.no_default,\n indicator: str | bool = False,\n validate: MergeValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Merge DataFrame or named Series objects with a database-style join.\n\n A named Series object is treated as a DataFrame with a single named column.\n\n The join is done on columns or indexes. If joining columns on\n columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes\n on indexes or indexes on a column or columns, the index will be passed on.\n When performing a cross merge, no column specifications to merge on are\n allowed.\n\n .. warning::\n\n If both key columns contain rows where the key is a null value, those\n rows will be matched against each other. This is different from usual SQL\n join behaviour and can lead to unexpected results.\n\n Parameters\n ----------\n right : DataFrame or named Series\n Object to merge with.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'inner'\n Type of merge to be performed.\n\n * left: use only keys from left frame, similar to a SQL left outer join;\n preserve key order.\n * right: use only keys from right frame, similar to a SQL right outer join;\n preserve key order.\n * outer: use union of keys from both frames, similar to a SQL full outer\n join; sort keys lexicographically.\n * inner: use intersection of keys from both frames, similar to a SQL inner\n join; preserve the order of the left keys.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use only keys from left frame that are not in right frame,\n similar to SQL left anti join; preserve key order.\n\n .. versionadded:: 3.0\n * right_anti: use only keys from right frame that are not in left frame,\n similar to SQL right anti join; preserve key order.\n\n .. versionadded:: 3.0\n on : Hashable or a sequence of the previous\n Column or index level names to join on. These must be found in both\n DataFrames. If `on` is None and not merging on indexes then this defaults\n to the intersection of the columns in both DataFrames.\n left_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the left DataFrame. Can also\n be an array or list of arrays of the length of the left DataFrame.\n These arrays are treated as if they are columns.\n right_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the right DataFrame. Can also\n be an array or list of arrays of the length of the right DataFrame.\n These arrays are treated as if they are columns.\n left_index : bool, default False\n Use the index from the left DataFrame as the join key(s). If it is a\n MultiIndex, the number of keys in the other DataFrame (either the index\n or a number of columns) must match the number of levels.\n right_index : bool, default False\n Use the index from the right DataFrame as the join key. Same caveats as\n left_index.\n sort : bool, default False\n Sort the join keys lexicographically in the result DataFrame. If False,\n the order of the join keys depends on the join type (how keyword).\n suffixes : list-like, default is (\"_x\", \"_y\")\n A length-2 sequence where each element is optionally a string\n indicating the suffix to add to overlapping column names in\n `left` and `right` respectively. Pass a value of `None` instead\n of a string to indicate that the column name from `left` or\n `right` should be left as-is, with no suffix. At least one of the\n values must not be None.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n indicator : bool or str, default False\n If True, adds a column to the output DataFrame called \"_merge\" with\n information on the source of each row. The column can be given a different\n name by providing a string argument. The column will have a Categorical\n type with the value of \"left_only\" for observations whose merge key only\n appears in the left DataFrame, \"right_only\" for observations\n whose merge key only appears in the right DataFrame, and \"both\"\n if the observation's merge key is found in both DataFrames.\n\n validate : str, optional\n If specified, checks if merge is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if merge keys are unique in both\n left and right datasets.\n * \"one_to_many\" or \"1:m\": check if merge keys are unique in left\n dataset.\n * \"many_to_one\" or \"m:1\": check if merge keys are unique in right\n dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A DataFrame of the two merged objects.\n\n See Also\n --------\n merge_ordered : Merge with optional filling/interpolation.\n merge_asof : Merge on nearest keys.\n DataFrame.join : Similar method using indices.\n\n Examples\n --------\n >>> df1 = pd.DataFrame(\n ... {\"lkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [1, 2, 3, 5]}\n ... )\n >>> df2 = pd.DataFrame(\n ... {\"rkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [5, 6, 7, 8]}\n ... )\n >>> df1\n lkey value\n 0 foo 1\n 1 bar 2\n 2 baz 3\n 3 foo 5\n >>> df2\n rkey value\n 0 foo 5\n 1 bar 6\n 2 baz 7\n 3 foo 8\n\n Merge df1 and df2 on the lkey and rkey columns. The value columns have\n the default suffixes, _x and _y, appended.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\")\n lkey value_x rkey value_y\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2 with specified left and right suffixes\n appended to any overlapping columns.\n\n >>> df1.merge(\n ... df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(\"_left\", \"_right\")\n ... )\n lkey value_left rkey value_right\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2, but raise an exception if the DataFrames have\n any overlapping columns.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(False, False))\n Traceback (most recent call last):\n ...\n ValueError: columns overlap but no suffix specified:\n Index(['value'], dtype='object')\n\n >>> df1 = pd.DataFrame({\"a\": [\"foo\", \"bar\"], \"b\": [1, 2]})\n >>> df2 = pd.DataFrame({\"a\": [\"foo\", \"baz\"], \"c\": [3, 4]})\n >>> df1\n a b\n 0 foo 1\n 1 bar 2\n >>> df2\n a c\n 0 foo 3\n 1 baz 4\n\n >>> df1.merge(df2, how=\"inner\", on=\"a\")\n a b c\n 0 foo 1 3\n\n >>> df1.merge(df2, how=\"left\", on=\"a\")\n a b c\n 0 foo 1 3.0\n 1 bar 2 NaN\n\n >>> df1 = pd.DataFrame({\"left\": [\"foo\", \"bar\"]})\n >>> df2 = pd.DataFrame({\"right\": [7, 8]})\n >>> df1\n left\n 0 foo\n 1 bar\n >>> df2\n right\n 0 7\n 1 8\n\n >>> df1.merge(df2, how=\"cross\")\n left right\n 0 foo 7\n 1 foo 8\n 2 bar 7\n 3 bar 8\n \"\"\"\n self._check_copy_deprecation(copy)\n\n from pandas.core.reshape.merge import merge\n\n return merge(\n self,\n right,\n how=how,\n on=on,\n left_on=left_on,\n right_on=right_on,\n left_index=left_index,\n right_index=right_index,\n sort=sort,\n suffixes=suffixes,\n indicator=indicator,\n validate=validate,\n )\n\n def round(\n self, decimals: int | dict[IndexLabel, int] | Series = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Round numeric columns in a DataFrame to a variable number of decimal places.\n\n Each column can be rounded to a different number of decimal places by\n passing a dict or Series mapping column names to the desired precision.\n Non-numeric columns are left unchanged.\n\n Parameters\n ----------\n decimals : int, dict, Series\n Number of decimal places to round each column to. If an int is\n given, round each column to the same number of places.\n Otherwise dict and Series round to variable numbers of places.\n Column names should be in the keys if `decimals` is a\n dict-like, or in the index if `decimals` is a Series. Any\n columns not included in `decimals` will be left as is. Elements\n of `decimals` which are not columns of the input will be\n ignored.\n *args\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n\n Returns\n -------\n DataFrame\n A DataFrame with the affected columns rounded to the specified\n number of decimal places.\n\n See Also\n --------\n numpy.around : Round a numpy array to the given number of decimals.\n Series.round : Round a Series to the given number of decimals.\n\n Notes\n -----\n For values exactly halfway between rounded decimal values, pandas rounds\n to the nearest even value (e.g. -0.5 and 0.5 round to 0.0, 1.5 and 2.5\n round to 2.0, etc.).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(0.21, 0.32), (0.01, 0.67), (0.66, 0.03), (0.21, 0.18)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df\n dogs cats\n 0 0.21 0.32\n 1 0.01 0.67\n 2 0.66 0.03\n 3 0.21 0.18\n\n By providing an integer each column is rounded to the same number\n of decimal places\n\n >>> df.round(1)\n dogs cats\n 0 0.2 0.3\n 1 0.0 0.7\n 2 0.7 0.0\n 3 0.2 0.2\n\n With a dict, the number of places for specific columns can be\n specified with the column names as key and the number of decimal\n places as value\n\n >>> df.round({\"dogs\": 1, \"cats\": 0})\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n\n Using a Series, the number of places for specific columns can be\n specified with the column names as index and the number of\n decimal places as value\n\n >>> decimals = pd.Series([0, 1], index=[\"cats\", \"dogs\"])\n >>> df.round(decimals)\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n \"\"\"\n from pandas.core.reshape.concat import concat\n\n def _dict_round(df: DataFrame, decimals) -> Iterator[Series]:\n for col, vals in df.items():\n try:\n yield _series_round(vals, decimals[col])\n except KeyError:\n yield vals\n\n def _series_round(ser: Series, decimals: int) -> Series:\n if is_integer_dtype(ser.dtype) or is_float_dtype(ser.dtype):\n return ser.round(decimals)\n elif isinstance(ser._values, (DatetimeArray, TimedeltaArray, PeriodArray)):\n # GH#57781\n # TODO: also the ArrowDtype analogues?\n warnings.warn(\n \"obj.round has no effect with datetime, timedelta, \"\n \"or period dtypes. Use obj.dt.round(...) instead.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n return ser\n\n nv.validate_round(args, kwargs)\n\n if isinstance(decimals, (dict, Series)):\n if isinstance(decimals, Series) and not decimals.index.is_unique:\n raise ValueError(\"Index of decimals must be unique\")\n if is_dict_like(decimals) and not all(\n is_integer(value) for _, value in decimals.items()\n ):\n raise TypeError(\"Values in decimals must be integers\")\n new_cols = list(_dict_round(self, decimals))\n elif is_integer(decimals):\n # Dispatch to Block.round\n # Argument \"decimals\" to \"round\" of \"BaseBlockManager\" has incompatible\n # type \"Union[int, integer[Any]]\"; expected \"int\"\n new_mgr = self._mgr.round(\n decimals=decimals, # type: ignore[arg-type]\n )\n return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(\n self, method=\"round\"\n )\n else:\n raise TypeError(\"decimals must be an integer, a dict-like or a Series\")\n\n if new_cols is not None and len(new_cols) > 0:\n return self._constructor(\n concat(new_cols, axis=1), index=self.index, columns=self.columns\n ).__finalize__(self, method=\"round\")\n else:\n return self.copy(deep=False)\n\n # ----------------------------------------------------------------------\n # Statistical methods, etc.\n\n def describe(\n self,\n percentiles=None,\n include=None,\n exclude=None,\n ) -> DataFrame:\n \"\"\"\n Generate descriptive statistics.\n\n Summarize the central tendency, dispersion, and shape of each\n analyzed column's distribution, excluding ``NaN`` values. By\n default only numeric columns are analyzed; pass ``include`` to\n also analyze non-numeric columns (or ``exclude`` to omit columns\n by dtype).\n\n Parameters\n ----------\n percentiles : list-like of numbers, optional\n The percentiles to include in the output. All should fall\n between 0 and 1. The default, ``None``, returns the 25th,\n 50th, and 75th percentiles.\n include : 'all', list-like of dtypes or None (default), optional\n Which column dtypes to include. Options:\n\n - ``'all'`` : Include all columns, including non-numeric ones.\n - list-like of dtypes : Limit the result to columns of the\n given dtypes, in the style of\n :meth:`DataFrame.select_dtypes` (e.g. ``include=[np.number]``\n or ``include=[\"category\"]``).\n - ``None`` (default) : Include only numeric columns, falling\n back to object and categorical columns if there are no\n numeric columns.\n exclude : list-like of dtypes or None (default), optional\n Column dtypes to omit from the result, in the style of\n :meth:`DataFrame.select_dtypes`. ``None`` (default) excludes\n nothing.\n\n Returns\n -------\n DataFrame\n Summary statistics of the DataFrame's columns.\n\n See Also\n --------\n Series.describe : Generate descriptive statistics of a Series.\n DataFrame.count : Count of non-NA observations per column.\n DataFrame.max : Maximum of the values in each column.\n DataFrame.min : Minimum of the values in each column.\n DataFrame.mean : Mean of the values.\n DataFrame.std : Standard deviation of the observations.\n DataFrame.select_dtypes : Subset of a DataFrame including/excluding\n columns based on their dtype.\n\n Notes\n -----\n For numeric columns, the result's index includes ``count``,\n ``mean``, ``std``, ``min``, ``max``, and the requested\n percentiles. By default the lower percentile is ``25`` and the\n upper is ``75``; the ``50`` percentile is the same as the median.\n\n For object columns, the result's index includes ``count``,\n ``unique``, ``top``, and ``freq``. The ``top`` is the most common\n value and ``freq`` is its count. If multiple values tie for the\n highest count, ``top`` is chosen arbitrarily from among them.\n\n With ``include='all'``, the result's index is the union of the\n per-dtype indices, with ``NaN`` for statistics that do not apply\n to a given column's dtype.\n\n Examples\n --------\n By default, only numeric columns are analyzed.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"categorical\": pd.Categorical([\"d\", \"e\", \"f\"]),\n ... \"numeric\": [1, 2, 3],\n ... \"object\": [\"a\", \"b\", \"c\"],\n ... }\n ... )\n >>> df.describe()\n numeric\n count 3.0\n mean 2.0\n std 1.0\n min 1.0\n 25% 1.5\n 50% 2.0\n 75% 2.5\n max 3.0\n\n All columns regardless of dtype.\n\n >>> df.describe(include=\"all\") # doctest: +SKIP\n categorical numeric object\n count 3 3.0 3\n unique 3 NaN 3\n top f NaN a\n freq 1 NaN 1\n mean NaN 2.0 NaN\n std NaN 1.0 NaN\n min NaN 1.0 NaN\n 25% NaN 1.5 NaN\n 50% NaN 2.0 NaN\n 75% NaN 2.5 NaN\n max NaN 3.0 NaN\n\n Restrict the result to a specific dtype.\n\n >>> df.describe(include=[\"category\"])\n categorical\n count 3\n unique 3\n top d\n freq 1\n\n Exclude a specific dtype.\n\n >>> df.describe(exclude=[np.number]) # doctest: +SKIP\n categorical object\n count 3 3\n unique 3 3\n top f a\n freq 1 1\n \"\"\"\n return super().describe(\n percentiles=percentiles, include=include, exclude=exclude\n )\n\n def corr(\n self,\n method: CorrelationMethod = \"pearson\",\n min_periods: int = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise correlation of columns, excluding NA/null values.\n\n The result is a symmetric DataFrame where each element represents\n the correlation coefficient between two columns. By default, the\n Pearson correlation is computed, but Kendall and Spearman methods\n as well as arbitrary callables are also supported.\n\n Parameters\n ----------\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float. Note that the returned matrix from corr\n will have 1 along the diagonals and will be symmetric\n regardless of the callable's behavior.\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n Correlation matrix.\n\n See Also\n --------\n DataFrame.corrwith : Compute pairwise correlation with another\n DataFrame or Series.\n Series.corr : Compute the correlation between two Series.\n\n Notes\n -----\n Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.\n\n * `Pearson correlation coefficient `_\n * `Kendall rank correlation coefficient `_\n * `Spearman's rank correlation coefficient `_\n\n Examples\n --------\n >>> def histogram_intersection(a, b):\n ... v = np.minimum(a, b).sum().round(decimals=1)\n ... return v\n >>> df = pd.DataFrame(\n ... [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df.corr(method=histogram_intersection)\n dogs cats\n dogs 1.0 0.3\n cats 0.3 1.0\n\n >>> df = pd.DataFrame(\n ... [(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.corr(min_periods=3)\n dogs cats\n dogs 1.0 NaN\n cats NaN 1.0\n \"\"\" # noqa: E501\n data = self._get_numeric_data() if numeric_only else self\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if method == \"pearson\":\n correl = libalgos.nancorr(mat, minp=min_periods)\n elif method == \"spearman\":\n correl = libalgos.nancorr_spearman(mat, minp=min_periods)\n elif method == \"kendall\" or callable(method):\n if min_periods is None:\n min_periods = 1\n mat = mat.T\n corrf = nanops.get_corr_func(method)\n K = len(cols)\n correl = np.empty((K, K), dtype=float)\n mask = np.isfinite(mat)\n for i, ac in enumerate(mat):\n for j, bc in enumerate(mat):\n if i > j:\n continue\n\n valid = mask[i] & mask[j]\n if valid.sum() < min_periods:\n c = np.nan\n elif i == j:\n c = 1.0\n elif not valid.all():\n c = corrf(ac[valid], bc[valid])\n else:\n c = corrf(ac, bc)\n correl[i, j] = c\n correl[j, i] = c\n else:\n raise ValueError(\n \"method must be either 'pearson', \"\n \"'spearman', 'kendall', or a callable, \"\n f\"'{method}' was supplied\"\n )\n\n result = self._constructor(correl, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"corr\")\n\n def cov(\n self,\n min_periods: int | None = None,\n ddof: int | None = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise covariance of columns, excluding NA/null values.\n\n Compute the pairwise covariance among the series of a DataFrame.\n The returned data frame is the `covariance matrix\n `__ of the columns\n of the DataFrame.\n\n Both NA and null values are automatically excluded from the\n calculation. (See the note below about bias from missing values.)\n A threshold can be set for the minimum number of\n observations for each value created. Comparisons with observations\n below this threshold will be returned as ``NaN``.\n\n This method is generally used for the analysis of time series data to\n understand the relationship between different measures\n across time.\n\n Parameters\n ----------\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result.\n\n ddof : int, default 1\n Delta degrees of freedom. The divisor used in calculations\n is ``N - ddof``, where ``N`` represents the number of elements.\n This argument is applicable only when no ``nan`` is in the dataframe.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n The covariance matrix of the series of the DataFrame.\n\n See Also\n --------\n Series.cov : Compute covariance with another Series.\n core.window.ewm.ExponentialMovingWindow.cov : Exponential weighted sample\n covariance.\n core.window.expanding.Expanding.cov : Expanding sample covariance.\n core.window.rolling.Rolling.cov : Rolling sample covariance.\n\n Notes\n -----\n Returns the covariance matrix of the DataFrame's time series.\n The covariance is normalized by N-ddof.\n\n For DataFrames that have Series that are missing data (assuming that\n data is `missing at random\n `__)\n the returned covariance matrix will be an unbiased estimate\n of the variance and covariance between the member Series.\n\n However, for many applications this estimate may not be acceptable\n because the estimate covariance matrix is not guaranteed to be positive\n semi-definite. This could lead to estimate correlations having\n absolute values which are greater than one, and/or a non-invertible\n covariance matrix. See `Estimation of covariance matrices\n `__ for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(1, 2), (0, 3), (2, 0), (1, 1)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.cov()\n dogs cats\n dogs 0.666667 -1.000000\n cats -1.000000 1.666667\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(\n ... np.random.randn(1000, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"]\n ... )\n >>> df.cov()\n a b c d e\n a 0.998438 -0.020161 0.059277 -0.008943 0.014144\n b -0.020161 1.059352 -0.008543 -0.024738 0.009826\n c 0.059277 -0.008543 1.010670 -0.001486 -0.000271\n d -0.008943 -0.024738 -0.001486 0.921297 -0.013692\n e 0.014144 0.009826 -0.000271 -0.013692 0.977795\n\n **Minimum number of periods**\n\n This method also supports an optional ``min_periods`` keyword\n that specifies the required minimum number of non-NA observations for\n each column pair in order to have a valid result:\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(np.random.randn(20, 3), columns=[\"a\", \"b\", \"c\"])\n >>> df.loc[df.index[:5], \"a\"] = np.nan\n >>> df.loc[df.index[5:10], \"b\"] = np.nan\n >>> df.cov(min_periods=12)\n a b c\n a 0.316741 NaN -0.150812\n b NaN 1.248003 0.191417\n c -0.150812 0.191417 0.895202\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n if any(blk.dtype.kind in \"mM\" for blk in self._mgr.blocks):\n msg = (\n \"DataFrame contains columns with dtype datetime64 \"\n \"or timedelta64, which are not supported for cov.\"\n )\n raise TypeError(msg)\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if notna(mat).all():\n if min_periods is not None and min_periods > len(mat):\n base_cov = np.empty((mat.shape[1], mat.shape[1]))\n base_cov.fill(np.nan)\n else:\n base_cov = np.cov(mat.T, ddof=ddof)\n base_cov = base_cov.reshape((len(cols), len(cols)))\n else:\n base_cov = libalgos.nancorr(mat, cov=True, minp=min_periods)\n\n result = self._constructor(base_cov, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"cov\")\n\n def corrwith(\n self,\n other: DataFrame | Series,\n axis: Axis = 0,\n drop: bool = False,\n method: CorrelationMethod = \"pearson\",\n numeric_only: bool = False,\n min_periods: int | None = None,\n ) -> Series:\n \"\"\"\n Compute pairwise correlation.\n\n Pairwise correlation is computed between rows or columns of\n DataFrame with rows or columns of Series or DataFrame. DataFrames\n are first aligned along both axes before computing the\n correlations.\n\n Parameters\n ----------\n other : DataFrame, Series\n Object with which to compute correlations.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' to compute row-wise, 1 or 'columns' for\n column-wise.\n drop : bool, default False\n Drop missing indices from result.\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n min_periods : int, optional\n Minimum number of observations needed to have a valid result.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n Series\n Pairwise correlations.\n\n See Also\n --------\n DataFrame.corr : Compute pairwise correlation of columns.\n\n Examples\n --------\n >>> index = [\"a\", \"b\", \"c\", \"d\", \"e\"]\n >>> columns = [\"one\", \"two\", \"three\", \"four\"]\n >>> df1 = pd.DataFrame(\n ... np.arange(20).reshape(5, 4), index=index, columns=columns\n ... )\n >>> df2 = pd.DataFrame(\n ... np.arange(16).reshape(4, 4), index=index[:4], columns=columns\n ... )\n >>> df1.corrwith(df2)\n one 1.0\n two 1.0\n three 1.0\n four 1.0\n dtype: float64\n\n >>> df2.corrwith(df1, axis=1)\n a 1.0\n b 1.0\n c 1.0\n d 1.0\n e NaN\n dtype: float64\n \"\"\"\n axis = self._get_axis_number(axis)\n this = self._get_numeric_data() if numeric_only else self\n\n if isinstance(other, Series):\n return this.apply(\n lambda x: other.corr(x, method=method, min_periods=min_periods),\n axis=axis,\n )\n\n if numeric_only:\n other = other._get_numeric_data()\n left, right = this.align(other, join=\"inner\")\n\n if axis == 1:\n left = left.T\n right = right.T\n\n if method == \"pearson\":\n # mask missing values\n left = left + right * 0\n right = right + left * 0\n\n # demeaned data\n ldem = left - left.mean(numeric_only=numeric_only)\n rdem = right - right.mean(numeric_only=numeric_only)\n\n num = (ldem * rdem).sum()\n dom = (\n (left.count() - 1)\n * left.std(numeric_only=numeric_only)\n * right.std(numeric_only=numeric_only)\n )\n\n correl = num / dom\n\n elif method in [\"kendall\", \"spearman\"] or callable(method):\n\n def c(x):\n return nanops.nancorr(x[0], x[1], method=method)\n\n correl = self._constructor_sliced(\n map(c, zip(left.values.T, right.values.T, strict=True)),\n index=left.columns,\n copy=False,\n )\n\n else:\n raise ValueError(\n f\"Invalid method {method} was passed, \"\n \"valid methods are: 'pearson', 'kendall', \"\n \"'spearman', or callable\"\n )\n\n if not drop:\n # Find non-matching labels along the given axis\n # and append missing correlations (GH 22375)\n raxis: AxisInt = 1 if axis == 0 else 0\n result_index = this._get_axis(raxis).union(other._get_axis(raxis))\n idx_diff = result_index.difference(correl.index)\n\n if len(idx_diff) > 0:\n correl = correl._append_internal(\n Series([np.nan] * len(idx_diff), index=idx_diff)\n )\n\n return correl\n\n # ----------------------------------------------------------------------\n # ndarray-like stats methods\n\n def count(self, axis: Axis = 0, numeric_only: bool = False) -> Series:\n \"\"\"\n Count non-NA cells for each column or row.\n\n The values `None`, `NaN`, `NaT`, ``pandas.NA`` are considered NA.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index' counts are generated for each column.\n If 1 or 'columns' counts are generated for each row.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n For each column/row the number of non-NA/null entries.\n\n See Also\n --------\n Series.count: Number of non-NA elements in a Series.\n DataFrame.value_counts: Count unique combinations of columns.\n DataFrame.shape: Number of DataFrame rows and columns (including NA\n elements).\n DataFrame.isna: Boolean same-sized DataFrame showing places of NA\n elements.\n\n Examples\n --------\n Constructing DataFrame from a dictionary:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Person\": [\"John\", \"Myla\", \"Lewis\", \"John\", \"Myla\"],\n ... \"Age\": [24.0, np.nan, 21.0, 33, 26],\n ... \"Single\": [False, True, True, True, False],\n ... }\n ... )\n >>> df\n Person Age Single\n 0 John 24.0 False\n 1 Myla NaN True\n 2 Lewis 21.0 True\n 3 John 33.0 True\n 4 Myla 26.0 False\n\n Notice the uncounted NA values:\n\n >>> df.count()\n Person 5\n Age 4\n Single 5\n dtype: int64\n\n Counts for each **row**:\n\n >>> df.count(axis=\"columns\")\n 0 3\n 1 2\n 2 3\n 3 3\n 4 3\n dtype: int64\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if numeric_only:\n frame = self._get_numeric_data()\n else:\n frame = self\n\n # GH #423\n if len(frame._get_axis(axis)) == 0:\n result = self._constructor_sliced(0, index=frame._get_agg_axis(axis))\n else:\n result = notna(frame).sum(axis=axis)\n\n return result.astype(\"int64\").__finalize__(self, method=\"count\")\n\n def _reduce(\n self,\n op,\n name: str,\n *,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n filter_type=None,\n **kwds,\n ):\n assert filter_type is None or filter_type == \"bool\", filter_type\n out_dtype = \"bool\" if filter_type == \"bool\" else None\n\n if axis is not None:\n axis = self._get_axis_number(axis)\n\n def func(values: np.ndarray):\n # We only use this in the case that operates on self.values\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def blk_func(values, axis: Axis = 1):\n if isinstance(values, ExtensionArray):\n if not is_1d_only_ea_dtype(values.dtype):\n return values._reduce(name, axis=1, skipna=skipna, **kwds)\n return values._reduce(name, skipna=skipna, keepdims=True, **kwds)\n else:\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def _get_data() -> DataFrame:\n if filter_type is None:\n data = self._get_numeric_data()\n else:\n # GH#25101, GH#24434\n assert filter_type == \"bool\"\n data = self._get_bool_data()\n return data\n\n # Case with EAs see GH#35881\n df = self\n if numeric_only:\n df = _get_data()\n if axis is None:\n dtype = find_common_type([block.values.dtype for block in df._mgr.blocks])\n if isinstance(dtype, ExtensionDtype):\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n return arr._reduce(name, skipna=skipna, keepdims=False, **kwds)\n return maybe_unbox_numpy_scalar(func(df.values))\n elif axis == 1:\n if len(df.index) == 0:\n # Taking a transpose would result in no columns, losing the dtype.\n # In the empty case, reducing along axis 0 or 1 gives the same\n # result dtype, so reduce with axis=0 and ignore values\n result = df._reduce(\n op,\n name,\n axis=0,\n skipna=skipna,\n numeric_only=False,\n filter_type=filter_type,\n **kwds,\n ).iloc[:0]\n result.index = df.index\n return result\n\n if df.shape[1]:\n # GH#51474: block-wise axis=1 reduction avoiding expensive\n # transpose for numpy-backed and 2D EA blocks.\n if (\n name in (\"sum\", \"prod\", \"min\", \"max\", \"any\", \"all\", \"mean\")\n and len(df._mgr.blocks) > 1\n and all(\n (isinstance(bv, np.ndarray) and bv.dtype.kind != \"O\")\n or (\n isinstance(bv, ExtensionArray)\n and bv.ndim == 2\n and name in (\"min\", \"max\")\n and skipna\n )\n for bv in (block.values for block in df._mgr.blocks)\n )\n ):\n return df._reduce_axis1(\n name,\n op,\n skipna=skipna,\n min_count=kwds.get(\"min_count\", 0),\n )\n dtype = find_common_type(\n [block.values.dtype for block in df._mgr.blocks]\n )\n if isinstance(dtype, ExtensionDtype):\n # GH 54341: fastpath for EA-backed axis=1 reductions\n # This flattens the frame into a single 1D array while keeping\n # track of the row and column indices of the original frame. Once\n # flattened, grouping by the row indices and aggregating should\n # be equivalent to transposing the original frame and aggregating\n # with axis=0.\n name = {\"argmax\": \"idxmax\", \"argmin\": \"idxmin\"}.get(name, name)\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n nrows, ncols = df.shape\n row_index = np.tile(np.arange(nrows), ncols)\n col_index = np.repeat(np.arange(ncols), nrows)\n ser = Series(arr, index=col_index, copy=False)\n if name == \"all\":\n # Behavior here appears incorrect; preserving\n # for backwards compatibility for now.\n # See https://github.com/pandas-dev/pandas/issues/57171\n skipna = True\n result = ser.groupby(row_index).agg(name, **kwds, skipna=skipna)\n result.index = df.index\n return result\n\n df = df.T\n\n # After possibly _get_data and transposing, we are now in the\n # simple case where we can use BlockManager.reduce\n res = df._mgr.reduce(blk_func)\n out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]\n out.name = None\n if out_dtype is not None and out.dtype != \"boolean\":\n out = out.astype(out_dtype)\n elif (df._mgr.get_dtypes() == object).any() and name not in [\"any\", \"all\"]:\n out = out.astype(object)\n\n return out\n\n def _reduce_axis1(\n self, name: str, func, skipna: bool, min_count: int = 0\n ) -> Series:\n \"\"\"\n Special case for _reduce to try to avoid a potentially-expensive transpose.\n\n Apply the reduction block-wise along axis=1 and then reduce the resulting\n 1D arrays.\n \"\"\"\n if name == \"all\":\n result = np.ones(len(self), dtype=bool)\n ufunc = np.logical_and\n elif name == \"any\":\n result = np.zeros(len(self), dtype=bool)\n # error: Incompatible types in assignment\n # (expression has type \"_UFunc_Nin2_Nout1[Literal['logical_or'],\n # Literal[20], Literal[False]]\", variable has type\n # \"_UFunc_Nin2_Nout1[Literal['logical_and'], Literal[20],\n # Literal[True]]\")\n ufunc = np.logical_or # type: ignore[assignment]\n elif name in (\"sum\", \"mean\"):\n result = None\n ufunc = np.add # type: ignore[assignment]\n elif name == \"prod\":\n result = None\n ufunc = np.multiply # type: ignore[assignment]\n elif name == \"min\":\n result = None\n ufunc = np.fmin if skipna else np.minimum # type: ignore[assignment]\n elif name == \"max\":\n result = None\n ufunc = np.fmax if skipna else np.maximum # type: ignore[assignment]\n else:\n raise NotImplementedError(name)\n\n for block in self._mgr.blocks:\n vals = block.values\n if name in (\"min\", \"max\"):\n middle = ufunc.reduce(vals, axis=0) # type: ignore[arg-type]\n elif name == \"mean\":\n middle = nanops.nansum(vals, axis=0, skipna=skipna, min_count=0) # type: ignore[arg-type]\n elif name in (\"sum\", \"prod\"):\n # min_count=0 here so each block produces a result;\n # the actual min_count threshold is applied across\n # all blocks after the loop.\n middle = func(vals, axis=0, skipna=skipna, min_count=0)\n else:\n middle = func(vals, axis=0, skipna=skipna)\n if result is None:\n result = middle.copy()\n else:\n result = ufunc(result, middle)\n\n # Handle min_count for sum/prod, and compute mean from sum/count\n if name in (\"sum\", \"prod\", \"mean\"):\n if (min_count > 0 or name == \"mean\") and result is not None:\n non_null_count = np.zeros(len(self), dtype=np.intp)\n for block in self._mgr.blocks:\n vals = block.values\n if vals.dtype.kind in \"biu\":\n # bool/int/uint cannot have NaN\n non_null_count += vals.shape[0]\n else:\n non_null_count += vals.shape[0] - isna(vals).sum(axis=0)\n if name == \"mean\":\n null_mask = non_null_count == 0\n result = result.astype(\"float64\")\n result[~null_mask] /= non_null_count[~null_mask]\n result[null_mask] = np.nan\n else:\n null_mask = non_null_count < min_count\n if null_mask.any():\n if result.dtype.kind not in \"fc\":\n result = result.astype(\"float64\")\n result[null_mask] = np.nan\n\n assert result is not None\n res_ser = self._constructor_sliced(result, index=self.index, copy=False)\n return res_ser\n\n # error: Signature of \"any\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def any(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def any(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def any(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n def any(\n self,\n *,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether any element is True, potentially over an axis.\n\n Returns False unless there is at least one element within a series or\n along a Dataframe axis that is True or equivalent (e.g. non-zero or\n non-empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be False, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n numpy.any : Numpy version of this method.\n Series.any : Return whether any element is True.\n Series.all : Return whether all elements are True.\n DataFrame.any : Return whether any element is True over requested axis.\n DataFrame.all : Return whether all elements are True over requested axis.\n\n Examples\n --------\n **Series**\n\n For Series input, the output is a scalar indicating whether any element\n is True.\n\n >>> pd.Series([False, False]).any()\n False\n >>> pd.Series([True, False]).any()\n True\n >>> pd.Series([], dtype=\"float64\").any()\n False\n >>> pd.Series([np.nan]).any()\n False\n >>> pd.Series([np.nan]).any(skipna=False)\n True\n\n **DataFrame**\n\n Whether each column contains at least one True element (the default).\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0, 2], \"C\": [0, 0]})\n >>> df\n A B C\n 0 1 0 0\n 1 2 2 0\n\n >>> df.any()\n A True\n B True\n C False\n dtype: bool\n\n Aggregating over the columns.\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 2]})\n >>> df\n A B\n 0 True 1\n 1 False 2\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 True\n dtype: bool\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 0]})\n >>> df\n A B\n 0 True 1\n 1 False 0\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Aggregating over the entire DataFrame with ``axis=None``.\n\n >>> df.any(axis=None)\n True\n\n `any` for an empty DataFrame is an empty Series.\n\n >>> pd.DataFrame([]).any()\n Series([], dtype: bool)\n \"\"\"\n result = self._logical_func(\n \"any\", nanops.nanany, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"any\")\n return result\n\n @overload\n def all(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def all(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def all(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"all\")\n def all(\n self,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether all elements are True, potentially over an axis.\n\n Returns True unless there at least one element within a series or\n along a Dataframe axis that is False or equivalent (e.g. zero or\n empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be True, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n Series.all : Return True if all elements are True.\n DataFrame.any : Return True if one (or more) elements are True.\n\n Examples\n --------\n **Series**\n\n >>> pd.Series([True, True]).all()\n True\n >>> pd.Series([True, False]).all()\n False\n >>> pd.Series([], dtype=\"float64\").all()\n True\n >>> pd.Series([np.nan]).all()\n True\n >>> pd.Series([np.nan]).all(skipna=False)\n True\n\n **DataFrames**\n\n Create a DataFrame from a dictionary.\n\n >>> df = pd.DataFrame({\"col1\": [True, True], \"col2\": [True, False]})\n >>> df\n col1 col2\n 0 True True\n 1 True False\n\n Default behaviour checks if values in each column all return True.\n\n >>> df.all()\n col1 True\n col2 False\n dtype: bool\n\n Specify ``axis='columns'`` to check if values in each row all return True.\n\n >>> df.all(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Or ``axis=None`` for whether every value is True.\n\n >>> df.all(axis=None)\n False\n \"\"\"\n result = self._logical_func(\n \"all\", nanops.nanall, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"all\")\n return result\n\n # error: Signature of \"min\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def min(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def min(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def min(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"min\")\n def min(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the minimum of the values over the requested axis.\n\n If you want the *index* of the minimum, use ``idxmin``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmin``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.min()\n 0\n \"\"\"\n result = super().min(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"min\")\n return result\n\n # error: Signature of \"max\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def max(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def max(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def max(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"max\")\n def max(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the maximum of the values over the requested axis.\n\n If you want the *index* of the maximum, use ``idxmax``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmax``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.max()\n 8\n \"\"\"\n result = super().max(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"max\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sum\")\n def sum(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the sum of the values over the requested axis.\n\n This is equivalent to the method ``numpy.sum``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sum with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Sum over requested axis.\n\n See Also\n --------\n Series.sum : Return the sum over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.std : Return the standard deviation of the values over the\n requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.sum()\n 14\n\n By default, the sum of an empty or all-NA Series is ``0``.\n\n >>> pd.Series([], dtype=\"float64\").sum() # min_count=0 is the default\n 0.0\n\n This can be controlled with the ``min_count`` parameter. For example, if\n you'd like the sum of an empty series to be NaN, pass ``min_count=1``.\n\n >>> pd.Series([], dtype=\"float64\").sum(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).sum()\n 0.0\n\n >>> pd.Series([np.nan]).sum(min_count=1)\n nan\n \"\"\"\n result = super().sum(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sum\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"prod\")\n def prod(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the product of the values over the requested axis.\n\n This multiplies all values in each column (or row when\n ``axis=1``) together, skipping missing values by default.\n An empty or all-NA column returns ``1`` unless ``min_count``\n is specified.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.prod with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n The product of the values over the requested axis.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n By default, the product of an empty or all-NA Series is ``1``\n\n >>> pd.Series([], dtype=\"float64\").prod()\n 1.0\n\n This can be controlled with the ``min_count`` parameter\n\n >>> pd.Series([], dtype=\"float64\").prod(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).prod()\n 1.0\n\n >>> pd.Series([np.nan]).prod(min_count=1)\n nan\n \"\"\"\n result = super().prod(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"prod\")\n return result\n\n # error: Signature of \"mean\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def mean(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def mean(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def mean(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"mean\")\n def mean(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the mean of the values over the requested axis.\n\n This computes the arithmetic mean of the values in each column\n (or row when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.mean()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.mean()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.mean(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.mean(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().mean(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"mean\")\n return result\n\n # error: Signature of \"median\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def median(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def median(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def median(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\"], name=\"median\"\n )\n def median(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the median of the values over the requested axis.\n\n This computes the median of the values in each column (or row\n when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.median()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.median()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.median(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.median(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().median(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"median\")\n return result\n\n # error: Signature of \"sem\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sem(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def sem(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def sem(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sem\")\n def sem(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased standard error of the mean over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sem with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series\n Unbiased standard error of the mean over requested axis.\n\n See Also\n --------\n DataFrame.var : Return unbiased variance over requested axis.\n DataFrame.std : Returns sample standard deviation over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> round(s.sem(), 6)\n 0.57735\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.sem()\n a 0.5\n b 0.5\n dtype: float64\n\n Using axis=1\n\n >>> df.sem(axis=1)\n tiger 0.5\n zebra 0.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.sem(numeric_only=True)\n a 0.5\n dtype: float64\n \"\"\"\n result = super().sem(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sem\")\n return result\n\n # error: Signature of \"var\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def var(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def var(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def var(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"var\")\n def var(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased variance over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.var with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series or scalaer\n Unbiased variance over requested axis.\n\n See Also\n --------\n numpy.var : Equivalent function in NumPy.\n Series.var : Return unbiased variance over Series values.\n Series.std : Return standard deviation over Series values.\n DataFrame.std : Return standard deviation of the values over\n the requested axis.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n >>> df.var()\n age 352.916667\n height 0.056367\n dtype: float64\n\n Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:\n\n >>> df.var(ddof=0)\n age 264.687500\n height 0.042275\n dtype: float64\n \"\"\"\n result = super().var(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"var\")\n return result\n\n # error: Signature of \"std\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def std(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def std(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def std(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"std\")\n def std(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return sample standard deviation over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.std with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs : dict\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Standard deviation over requested axis.\n\n See Also\n --------\n Series.std : Return standard deviation over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.sum : Return the sum of the values over the requested axis.\n\n Notes\n -----\n To have the same behaviour as ``numpy.std``, use ``ddof=0`` (instead of\n the default ``ddof=1``) and ``skipna=False``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n The standard deviation of the columns can be found as follows:\n\n >>> df.std()\n age 18.786076\n height 0.237417\n dtype: float64\n\n Alternatively, `ddof=0` can be set to normalize by N instead of N-1:\n\n >>> df.std(ddof=0)\n age 16.269219\n height 0.205609\n dtype: float64\n \"\"\"\n result = super().std(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"std\")\n return result\n\n # error: Signature of \"skew\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def skew(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def skew(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def skew(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"skew\")\n def skew(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased skew over requested axis.\n\n Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased skew over requested axis.\n\n See Also\n --------\n DataFrame.kurt : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.skew()\n 0.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [2, 3, 4], \"c\": [1, 3, 5]},\n ... index=[\"tiger\", \"zebra\", \"cow\"],\n ... )\n >>> df\n a b c\n tiger 1 2 1\n zebra 2 3 3\n cow 3 4 5\n >>> df.skew()\n a 0.0\n b 0.0\n c 0.0\n dtype: float64\n\n Using axis=1\n\n >>> df.skew(axis=1)\n tiger 1.732051\n zebra -1.732051\n cow 0.000000\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [\"T\", \"Z\", \"X\"]}, index=[\"tiger\", \"zebra\", \"cow\"]\n ... )\n >>> df.skew(numeric_only=True)\n a 0.0\n dtype: float64\n \"\"\"\n result = super().skew(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"skew\")\n return result\n\n # error: Signature of \"kurt\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def kurt(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"kurt\")\n def kurt(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased kurtosis over requested axis.\n\n Kurtosis obtained using Fisher's definition of\n kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased kurtosis over requested axis.\n\n See Also\n --------\n DataFrame.kurtosis : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 2, 3], index=[\"cat\", \"dog\", \"dog\", \"mouse\"])\n >>> s\n cat 1\n dog 2\n dog 2\n mouse 3\n dtype: int64\n >>> round(s.kurt(), 6)\n 1.5\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 2, 3], \"b\": [3, 4, 4, 4]},\n ... index=[\"cat\", \"dog\", \"dog\", \"mouse\"],\n ... )\n >>> df\n a b\n cat 1 3\n dog 2 4\n dog 2 4\n mouse 3 4\n >>> round(df.kurt(), 6)\n a 1.5\n b 4.0\n dtype: float64\n\n With axis=None\n\n >>> round(df.kurt(axis=None), 6)\n -0.988693\n\n Using axis=1\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2], \"b\": [3, 4], \"c\": [3, 4], \"d\": [1, 2]},\n ... index=[\"cat\", \"dog\"],\n ... )\n >>> df.kurt(axis=1)\n cat -6.0\n dog -6.0\n dtype: float64\n \"\"\"\n result = super().kurt(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"kurt\")\n return result\n\n # error: Incompatible types in assignment\n kurtosis = kurt # type: ignore[assignment]\n product = prod\n\n def cummin(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative minimum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n minimum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative minimum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.min : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.min : Return the minimum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummin()\n 0 2.0\n 1 NaN\n 2 2.0\n 3 -1.0\n 4 -1.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummin(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the minimum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummin()\n A B\n 0 2.0 1.0\n 1 2.0 NaN\n 2 1.0 0.0\n\n To iterate over columns and find the minimum in each row,\n use ``axis=1``\n\n >>> df.cummin(axis=1)\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummin(data, axis, skipna, *args, **kwargs)\n\n def cummax(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative maximum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n maximum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative maximum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.max : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.max : Return the maximum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummax()\n 0 2.0\n 1 NaN\n 2 5.0\n 3 5.0\n 4 5.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummax(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the maximum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummax()\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 3.0 1.0\n\n To iterate over columns and find the maximum in each row,\n use ``axis=1``\n\n >>> df.cummax(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummax(data, axis, skipna, *args, **kwargs)\n\n def cumsum(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative sum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n sum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative sum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.sum : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.sum : Return the sum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumsum()\n 0 2.0\n 1 NaN\n 2 7.0\n 3 6.0\n 4 6.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumsum(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the sum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumsum()\n A B\n 0 2.0 1.0\n 1 5.0 NaN\n 2 6.0 1.0\n\n To iterate over columns and find the sum in each row,\n use ``axis=1``\n\n >>> df.cumsum(axis=1)\n A B\n 0 2.0 3.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumsum(data, axis, skipna, *args, **kwargs)\n\n def cumprod(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative product over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n product.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative product of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.prod : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.prod : Return the product over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumprod()\n 0 2.0\n 1 NaN\n 2 10.0\n 3 -10.0\n 4 -0.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumprod(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the product\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumprod()\n A B\n 0 2.0 1.0\n 1 6.0 NaN\n 2 6.0 0.0\n\n To iterate over columns and find the product in each row,\n use ``axis=1``\n\n >>> df.cumprod(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumprod(data, axis, skipna, *args, **kwargs)\n\n def nunique(self, axis: Axis = 0, dropna: bool = True) -> Series:\n \"\"\"\n Count number of distinct elements in specified axis.\n\n Return Series with number of distinct elements. Can ignore NaN\n values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for\n column-wise.\n dropna : bool, default True\n Don't include NaN in the counts.\n\n Returns\n -------\n Series\n Series with counts of unique values per row or column, depending on `axis`.\n\n See Also\n --------\n Series.nunique: Method nunique for Series.\n DataFrame.count: Count non-NA cells for each column or row.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [4, 5, 6], \"B\": [4, 1, 1]})\n >>> df.nunique()\n A 3\n B 2\n dtype: int64\n\n >>> df.nunique(axis=1)\n 0 1\n 1 2\n 2 2\n dtype: int64\n \"\"\"\n return self.apply(Series.nunique, axis=axis, dropna=dropna)\n\n def idxmin(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of minimum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of minima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmin : Return index of the minimum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmin``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the minimum value in each column.\n\n >>> df.idxmin()\n consumption Pork\n co2_emissions Wheat Products\n dtype: str\n\n To return the index for the minimum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmin(axis=\"columns\")\n Pork consumption\n Wheat Products co2_emissions\n Beef consumption\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmin, \"argmin\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be np.ndarray since axis is not N\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmin\")\n\n def idxmax(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of maximum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of maxima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmax : Return index of the maximum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmax``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the maximum value in each column.\n\n >>> df.idxmax()\n consumption Wheat Products\n co2_emissions Beef\n dtype: str\n\n To return the index for the maximum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmax(axis=\"columns\")\n Pork co2_emissions\n Wheat Products consumption\n Beef co2_emissions\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmax, \"argmax\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be 1d array since axis is not None\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmax\")\n\n def _get_agg_axis(self, axis_num: int) -> Index:\n \"\"\"\n Let's be explicit about this.\n \"\"\"\n if axis_num == 0:\n return self.columns\n elif axis_num == 1:\n return self.index\n else:\n raise ValueError(f\"Axis must be 0 or 1 (got {axis_num!r})\")\n\n def mode(\n self, axis: Axis = 0, numeric_only: bool = False, dropna: bool = True\n ) -> DataFrame:\n \"\"\"\n Get the mode(s) of each element along the selected axis.\n\n The mode of a set of values is the value that appears most often.\n It can be multiple values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to iterate over while searching for the mode:\n\n * 0 or 'index' : get mode of each column\n * 1 or 'columns' : get mode of each row.\n\n numeric_only : bool, default False\n If True, only apply to numeric columns.\n dropna : bool, default True\n Don't consider counts of NaN/NaT.\n\n Returns\n -------\n DataFrame\n The modes of each column or row.\n\n See Also\n --------\n Series.mode : Return the highest frequency value in a Series.\n Series.value_counts : Return the counts of values in a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"bird\", 2, 2),\n ... (\"mammal\", 4, np.nan),\n ... (\"arthropod\", 8, 0),\n ... (\"bird\", 2, np.nan),\n ... ],\n ... index=(\"falcon\", \"horse\", \"spider\", \"ostrich\"),\n ... columns=(\"species\", \"legs\", \"wings\"),\n ... )\n >>> df\n species legs wings\n falcon bird 2 2.0\n horse mammal 4 NaN\n spider arthropod 8 0.0\n ostrich bird 2 NaN\n\n By default, missing values are not considered, and the mode of wings\n are both 0 and 2. Because the resulting DataFrame has two rows,\n the second row of ``species`` and ``legs`` contains ``NaN``.\n\n >>> df.mode()\n species legs wings\n 0 bird 2.0 0.0\n 1 NaN NaN 2.0\n\n Setting ``dropna=False`` ``NaN`` values are considered and they can be\n the mode (like for wings).\n\n >>> df.mode(dropna=False)\n species legs wings\n 0 bird 2 NaN\n\n Setting ``numeric_only=True``, only the mode of numeric columns is\n computed, and columns of other types are ignored.\n\n >>> df.mode(numeric_only=True)\n legs wings\n 0 2.0 0.0\n 1 NaN 2.0\n\n To compute the mode over columns and not rows, use the axis parameter:\n\n >>> df.mode(axis=\"columns\", numeric_only=True)\n 0 1\n falcon 2.0 NaN\n horse 4.0 NaN\n spider 0.0 8.0\n ostrich 2.0 NaN\n \"\"\"\n data = self if not numeric_only else self._get_numeric_data()\n\n def f(s):\n return s.mode(dropna=dropna)\n\n data = data.apply(f, axis=axis)\n # Ensure index is type stable (should always use int index)\n if data.empty:\n data.index = default_index(0)\n\n return data\n\n @overload\n def quantile(\n self,\n q: float = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series: ...\n\n @overload\n def quantile(\n self,\n q: AnyArrayLike | Sequence[float],\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n @overload\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = 0.5,\n axis: Axis = 0,\n numeric_only: bool = False,\n interpolation: QuantileInterpolation = \"linear\",\n method: Literal[\"single\", \"table\"] = \"single\",\n ) -> Series | DataFrame:\n \"\"\"\n Return values at the given quantile over requested axis.\n\n This method computes the value below which a given proportion of\n observations fall. By default, it computes quantiles column-wise,\n but row-wise computation is also supported via ``axis=1``.\n\n Parameters\n ----------\n q : float or array-like, default 0.5 (50% quantile)\n Value between 0 <= q <= 1, the quantile(s) to compute.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Equals 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}\n This optional parameter specifies the interpolation method to use,\n when the desired quantile lies between two data points `i` and `j`:\n\n * linear: `i + (j - i) * fraction`, where `fraction` is the\n fractional part of the index surrounded by `i` and `j`.\n * lower: `i`.\n * higher: `j`.\n * nearest: `i` or `j` whichever is nearest.\n * midpoint: (`i` + `j`) / 2.\n method : {'single', 'table'}, default 'single'\n Whether to compute quantiles per-column ('single') or over all columns\n ('table'). When 'table', the only allowed interpolation methods are\n 'nearest', 'lower', and 'higher'.\n\n Returns\n -------\n Series or DataFrame\n\n If ``q`` is an array, a DataFrame will be returned where the\n index is ``q``, the columns are the columns of self, and the\n values are the quantiles.\n If ``q`` is a float, a Series will be returned where the\n index is the columns of self and the values are the quantiles.\n\n See Also\n --------\n core.window.rolling.Rolling.quantile: Rolling quantile.\n numpy.percentile: Numpy function to compute the percentile.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=[\"a\", \"b\"]\n ... )\n >>> df.quantile(0.1)\n a 1.3\n b 3.7\n Name: 0.1, dtype: float64\n >>> df.quantile([0.1, 0.5])\n a b\n 0.1 1.3 3.7\n 0.5 2.5 55.0\n\n Specifying `method='table'` will compute the quantile over all columns.\n\n >>> df.quantile(0.1, method=\"table\", interpolation=\"nearest\")\n a 1\n b 1\n Name: 0.1, dtype: int64\n >>> df.quantile([0.1, 0.5], method=\"table\", interpolation=\"nearest\")\n a b\n 0.1 1 1\n 0.5 3 100\n\n Specifying `numeric_only=False` will compute the quantiles for all\n columns.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [1, 2],\n ... \"B\": [pd.Timestamp(\"2010\"), pd.Timestamp(\"2011\")],\n ... \"C\": [pd.Timedelta(\"1 days\"), pd.Timedelta(\"2 days\")],\n ... }\n ... )\n >>> df.quantile(0.5, numeric_only=False)\n A 1.5\n B 2010-07-02 12:00:00\n C 1 days 12:00:00\n Name: 0.5, dtype: object\n \"\"\"\n validate_percentile(q)\n axis = self._get_axis_number(axis)\n\n if not is_list_like(q):\n # BlockManager.quantile expects listlike, so we wrap and unwrap here\n # error: List item 0 has incompatible type \"float | ExtensionArray |\n # ndarray[Any, Any] | Index | Series | Sequence[float]\"; expected \"float\"\n res_df = self.quantile(\n [q], # type: ignore[list-item]\n axis=axis,\n numeric_only=numeric_only,\n interpolation=interpolation,\n method=method,\n )\n if method == \"single\":\n res = res_df.iloc[0]\n else:\n # cannot directly iloc over sparse arrays\n res = res_df.T.iloc[:, 0]\n if axis == 1 and len(self) == 0:\n # GH#41544 try to get an appropriate dtype\n dtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(dtype):\n return res.astype(dtype)\n return res\n\n q = Index(q, dtype=np.float64)\n data = self._get_numeric_data() if numeric_only else self\n\n if axis == 1:\n data = data.T\n\n if len(data.columns) == 0:\n # GH#23925 _get_numeric_data may have dropped all columns\n cols = self.columns[:0]\n\n dtype = np.float64\n if axis == 1:\n # GH#41544 try to get an appropriate dtype\n cdtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(cdtype):\n dtype = cdtype\n\n res = self._constructor([], index=q, columns=cols, dtype=dtype)\n return res.__finalize__(self, method=\"quantile\")\n\n valid_method = {\"single\", \"table\"}\n if method not in valid_method:\n raise ValueError(\n f\"Invalid method: {method}. Method must be in {valid_method}.\"\n )\n if method == \"single\":\n res = data._mgr.quantile(qs=q, interpolation=interpolation)\n elif method == \"table\":\n valid_interpolation = {\"nearest\", \"lower\", \"higher\"}\n if interpolation not in valid_interpolation:\n raise ValueError(\n f\"Invalid interpolation: {interpolation}. \"\n f\"Interpolation must be in {valid_interpolation}\"\n )\n # handle degenerate case\n if len(data) == 0:\n if data.ndim == 2:\n dtype = find_common_type(list(self.dtypes))\n else:\n dtype = self.dtype\n return self._constructor([], index=q, columns=data.columns, dtype=dtype)\n\n q_idx = np.quantile(np.arange(len(data)), q, method=interpolation)\n\n by = data.columns\n if len(by) > 1:\n keys = [data._get_label_or_level_values(x) for x in by]\n indexer = lexsort_indexer(keys)\n else:\n k = data._get_label_or_level_values(by[0])\n indexer = nargsort(k)\n\n res = data._mgr.take(indexer[q_idx], verify=False)\n res.axes[1] = q\n\n result = self._constructor_from_mgr(res, axes=res.axes)\n return result.__finalize__(self, method=\"quantile\")\n\n def to_timestamp(\n self,\n freq: Frequency | None = None,\n how: ToTimestampHow = \"start\",\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Cast PeriodIndex to DatetimeIndex of timestamps, at *beginning* of period.\n\n This can be changed to the *end* of the period, by specifying `how=\"e\"`.\n\n Parameters\n ----------\n freq : str, default frequency of PeriodIndex\n Desired frequency.\n how : {'s', 'e', 'start', 'end'}\n Convention for converting period to timestamp; start of period\n vs. end.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame with DatetimeIndex\n DataFrame with the PeriodIndex cast to DatetimeIndex.\n\n See Also\n --------\n DataFrame.to_period: Inverse method to cast DatetimeIndex to PeriodIndex.\n Series.to_timestamp: Equivalent method for Series.\n\n Examples\n --------\n >>> idx = pd.PeriodIndex([\"2023\", \"2024\"], freq=\"Y\")\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d, index=idx)\n >>> df1\n col1 col2\n 2023 1 3\n 2024\t 2 4\n\n The resulting timestamps will be at the beginning of the year in this case\n\n >>> df1 = df1.to_timestamp()\n >>> df1\n col1 col2\n 2023-01-01 1 3\n 2024-01-01 2 4\n >>> df1.index\n DatetimeIndex(['2023-01-01', '2024-01-01'], dtype='datetime64[us]', freq=None)\n\n Using `freq` which is the offset that the Timestamps will have\n\n >>> df2 = pd.DataFrame(data=d, index=idx)\n >>> df2 = df2.to_timestamp(freq=\"M\")\n >>> df2\n col1 col2\n 2023-01-31 1 3\n 2024-01-31 2 4\n >>> df2.index\n DatetimeIndex(['2023-01-31', '2024-01-31'], dtype='datetime64[us]', freq=None)\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, PeriodIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_timestamp(freq=freq, how=how)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def to_period(\n self,\n freq: Frequency | None = None,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Convert DataFrame from DatetimeIndex to PeriodIndex.\n\n Convert DataFrame from DatetimeIndex to PeriodIndex with desired\n frequency (inferred from index if not passed). Either index of columns can be\n converted, depending on `axis` argument.\n\n Parameters\n ----------\n freq : str, default\n Frequency of the PeriodIndex.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The DataFrame with the converted PeriodIndex.\n\n See Also\n --------\n Series.to_period: Equivalent method for Series.\n Series.dt.to_period: Convert DateTime column values.\n\n Examples\n --------\n >>> idx = pd.to_datetime(\n ... [\n ... \"2001-03-31 00:00:00\",\n ... \"2002-05-31 00:00:00\",\n ... \"2003-08-31 00:00:00\",\n ... ]\n ... )\n\n >>> idx\n DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],\n dtype='datetime64[us]', freq=None)\n\n >>> idx.to_period(\"M\")\n PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')\n\n For the yearly frequency\n\n >>> idx.to_period(\"Y\")\n PeriodIndex(['2001', '2002', '2003'], dtype='period[Y-DEC]')\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, DatetimeIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_period(freq=freq)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def isin(self, values: Series | DataFrame | Sequence | Mapping) -> DataFrame:\n \"\"\"\n Whether each element in the DataFrame is contained in values.\n\n Returns a DataFrame of the same shape with boolean values: True\n where the element is in the corresponding structure of\n ``values``, False otherwise. ``values`` can be a list, dict,\n Series, or DataFrame; alignment rules depend on its type.\n\n Parameters\n ----------\n values : iterable, Series, DataFrame or dict\n The result will only be true at a location if all the\n labels match. If `values` is a Series, that's the index. If\n `values` is a dict, the keys must be the column names,\n which must match. If `values` is a DataFrame,\n then both the index and column labels must match.\n\n Returns\n -------\n DataFrame\n DataFrame of booleans showing whether each element in the DataFrame\n is contained in values.\n\n See Also\n --------\n DataFrame.eq: Equality test for DataFrame.\n Series.isin: Equivalent method on Series.\n Series.str.contains: Test if pattern or regex is contained within a\n string of a Series or Index.\n\n Notes\n -----\n ``__iter__`` is used (and not ``__contains__``) to iterate over values\n when checking if it contains the elements in DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4], \"num_wings\": [2, 0]}, index=[\"falcon\", \"dog\"]\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n\n When ``values`` is a list check whether every value in the DataFrame\n is present in the list (which animals have 0 or 2 legs or wings)\n\n >>> df.isin([0, 2])\n num_legs num_wings\n falcon True True\n dog False True\n\n To check if ``values`` is *not* in the DataFrame, use the ``~`` operator:\n\n >>> ~df.isin([0, 2])\n num_legs num_wings\n falcon False False\n dog True False\n\n When ``values`` is a dict, we can pass values to check for each\n column separately:\n\n >>> df.isin({\"num_wings\": [0, 3]})\n num_legs num_wings\n falcon False False\n dog False True\n\n When ``values`` is a Series or DataFrame the index and column must\n match. Note that 'falcon' does not match based on the number of legs\n in other.\n\n >>> other = pd.DataFrame(\n ... {\"num_legs\": [8, 3], \"num_wings\": [0, 2]}, index=[\"spider\", \"falcon\"]\n ... )\n >>> df.isin(other)\n num_legs num_wings\n falcon False True\n dog False False\n \"\"\"\n if isinstance(values, dict):\n from pandas.core.reshape.concat import concat\n\n values = collections.defaultdict(list, values)\n result = concat(\n (\n self.iloc[:, [i]].isin(values[col])\n for i, col in enumerate(self.columns)\n ),\n axis=1,\n )\n elif isinstance(values, Series):\n if not values.index.is_unique:\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self), axis=\"index\")\n elif isinstance(values, DataFrame):\n if not (values.columns.is_unique and values.index.is_unique):\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self))\n else:\n if not is_list_like(values):\n raise TypeError(\n \"only list-like or dict-like objects are allowed \"\n \"to be passed to DataFrame.isin(), \"\n f\"you passed a '{type(values).__name__}'\"\n )\n\n def isin_(x):\n # error: Argument 2 to \"isin\" has incompatible type \"Union[Series,\n # DataFrame, Sequence[Any], Mapping[Any, Any]]\"; expected\n # \"Union[Union[Union[ExtensionArray, ndarray[Any, Any]], Index,\n # Series], List[Any], range]\"\n result = algorithms.isin(\n x.ravel(),\n values, # type: ignore[arg-type]\n )\n return result.reshape(x.shape)\n\n res_mgr = self._mgr.apply(isin_)\n result = self._constructor_from_mgr(\n res_mgr,\n axes=res_mgr.axes,\n )\n return result.__finalize__(self, method=\"isin\")\n\n # ----------------------------------------------------------------------\n # Add index and columns\n _AXIS_ORDERS: list[Literal[\"index\", \"columns\"]] = [\"index\", \"columns\"]\n _AXIS_TO_AXIS_NUMBER: dict[Axis, int] = {\n **NDFrame._AXIS_TO_AXIS_NUMBER,\n 1: 1,\n \"columns\": 1,\n }\n _AXIS_LEN = len(_AXIS_ORDERS)\n _info_axis_number: Literal[1] = 1\n _info_axis_name: Literal[\"columns\"] = \"columns\"\n\n index = properties.AxisProperty(\n axis=1,\n doc=\"\"\"\n The index (row labels) of the DataFrame.\n\n The index of a DataFrame is a series of labels that identify each row.\n The labels can be integers, strings, or any other hashable type. The index\n is used for label-based access and alignment, and can be accessed or\n modified using this attribute.\n\n Returns\n -------\n pandas.Index\n The index labels of the DataFrame.\n\n See Also\n --------\n DataFrame.columns : The column labels of the DataFrame.\n DataFrame.to_numpy : Convert the DataFrame to a NumPy array.\n\n Examples\n --------\n >>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],\n ... 'Age': [25, 30, 35],\n ... 'Location': ['Seattle', 'New York', 'Kona']},\n ... index=([10, 20, 30]))\n >>> df.index\n Index([10, 20, 30], dtype='int64')\n\n In this example, we create a DataFrame with 3 rows and 3 columns,\n including Name, Age, and Location information. We set the index labels to\n be the integers 10, 20, and 30. We then access the `index` attribute of the\n DataFrame, which returns an `Index` object containing the index labels.\n\n >>> df.index = [100, 200, 300]\n >>> df\n Name Age Location\n 100 Alice 25 Seattle\n 200 Bob 30 New York\n 300 Aritra 35 Kona\n\n In this example, we modify the index labels of the DataFrame by assigning\n a new list of labels to the `index` attribute. The DataFrame is then\n updated with the new labels, and the output shows the modified DataFrame.\n \"\"\",\n )\n columns = properties.AxisProperty(\n axis=0,\n doc=\"\"\"\n The column labels of the DataFrame.\n\n This property holds the column names as a pandas ``Index`` object.\n It provides an immutable sequence of column labels that can be\n used for data selection, renaming, and alignment in DataFrame operations.\n\n Returns\n -------\n pandas.Index\n The column labels of the DataFrame.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.axes: Return a list representing the axes of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})\n >>> df\n A B\n 0 1 3\n 1 2 4\n >>> df.columns\n Index(['A', 'B'], dtype='str')\n \"\"\",\n )\n\n # ----------------------------------------------------------------------\n # Add plotting methods to DataFrame\n plot = Accessor(\"plot\", pandas.plotting.PlotAccessor)\n hist = pandas.plotting.hist_frame\n boxplot = pandas.plotting.boxplot_frame\n sparse = Accessor(\"sparse\", SparseFrameAccessor)\n\n # ----------------------------------------------------------------------\n # Internal Interface Methods\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\n @property\n def values(self) -> np.ndarray:\n \"\"\"\n Return a Numpy representation of the DataFrame.\n\n .. warning::\n\n We recommend using :meth:`DataFrame.to_numpy` instead.\n ``.values`` offers no way to control the output ``dtype``, copy\n semantics, or the value used to fill missing entries, while\n :meth:`DataFrame.to_numpy` exposes those as the ``dtype``,\n ``copy``, and ``na_value`` arguments. The mutability of the\n result also depends on the DataFrame's internal block layout:\n when the DataFrame is backed by a single block the result is a\n read-only view (writes raise); when there are multiple blocks\n the result is a writable copy whose mutations do not propagate\n back to the DataFrame.\n\n Only the values in the DataFrame will be returned, the axes labels\n will be removed.\n\n Returns\n -------\n numpy.ndarray\n The values of the DataFrame.\n\n See Also\n --------\n DataFrame.to_numpy : Recommended alternative to this method.\n DataFrame.index : Retrieve the index labels.\n DataFrame.columns : Retrieving the column names.\n\n Notes\n -----\n The returned array is not intended to be written to. When the\n DataFrame is backed by a single NumPy array (single dtype, single\n block), the result is a read-only view; when the DataFrame has\n multiple internal blocks (e.g. after adding a new column), the\n result is a copy and modifications to it will not be reflected in\n the original DataFrame. Use :meth:`DataFrame.to_numpy` for more\n explicit control over copy behavior, or use :attr:`DataFrame.iloc`\n to modify values in-place.\n\n The dtype will be a lower-common-denominator dtype (implicit\n upcasting); that is to say if the dtypes (even of numeric types)\n are mixed, the one that accommodates all will be chosen. Use this\n with care if you are not dealing with the blocks.\n\n e.g. If the dtypes are float16 and float32, dtype will be upcast to\n float32. If dtypes are int32 and uint8, dtype will be upcast to\n int32. By :func:`numpy.find_common_type` convention, mixing int64\n and uint64 will result in a float64 dtype.\n\n Examples\n --------\n A DataFrame where all columns are the same type (e.g., int64) results\n in an array of the same type.\n\n >>> df = pd.DataFrame(\n ... {\"age\": [3, 29], \"height\": [94, 170], \"weight\": [31, 115]}\n ... )\n >>> df\n age height weight\n 0 3 94 31\n 1 29 170 115\n >>> df.dtypes\n age int64\n height int64\n weight int64\n dtype: object\n >>> df.values\n array([[ 3, 94, 31],\n [ 29, 170, 115]])\n\n A DataFrame with mixed type columns(e.g., str/object, int64, float32)\n results in an ndarray of the broadest type that accommodates these\n mixed types (e.g., object).\n\n >>> df2 = pd.DataFrame(\n ... [\n ... (\"parrot\", 24.0, \"second\"),\n ... (\"lion\", 80.5, 1),\n ... (\"monkey\", np.nan, None),\n ... ],\n ... columns=(\"name\", \"max_speed\", \"rank\"),\n ... )\n >>> df2.dtypes\n name str\n max_speed float64\n rank object\n dtype: object\n >>> df2.values\n array([['parrot', 24.0, 'second'],\n ['lion', 80.5, 1],\n ['monkey', nan, None]], dtype=object)\n\n ``DataFrame.to_numpy`` produces the same array by default, but lets\n you choose how missing values are represented and request a\n guaranteed copy:\n\n >>> df3 = pd.DataFrame({\"a\": [1, 2], \"b\": [3.0, np.nan]})\n >>> df3.values\n array([[ 1., 3.],\n [ 2., nan]])\n >>> df3.to_numpy(na_value=-1)\n array([[ 1., 3.],\n [ 2., -1.]])\n >>> df3.to_numpy(dtype=\"float32\", copy=True)\n array([[ 1., 3.],\n [ 2., nan]], dtype=float32)\n \"\"\"\n return self._mgr.as_array()\n\n\ndef _from_nested_dict(\n data: Mapping[HashableT, Mapping[HashableT2, T]],\n) -> collections.defaultdict[HashableT2, dict[HashableT, T]]:\n new_data: collections.defaultdict[HashableT2, dict[HashableT, T]] = (\n collections.defaultdict(dict)\n )\n for index, s in data.items():\n for col, v in s.items():\n new_data[col][index] = v\n return new_data\n\n\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n```\n
","tags":["context-window","needle-retrieval","python","middle","256k"],"expected_answer":"IH_NEEDLE_256K_MIDDLE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":256000,"needle_position":"middle","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-late-256k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-late-256k\nApproximate target context: 256000 tokens; needle position: late_80_percent.\nFind the Python benchmark needle for needle-late-256k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n # test_append_empty_frame_to_series_with_dateutil_tz\n row_df = row_df.infer_objects().rename_axis(index.names)\n\n if len(row_df.columns) == len(self.columns):\n # Pre-cast the row's value to the original column dtype where the\n # row's inferred dtype would otherwise force concat to widen the\n # whole column. This avoids an O(N) materialize-and-rebuild\n # roundtrip in _post_expansion_casting, and (for EA dtypes that\n # carry array-level state not encoded in the dtype, e.g. geopandas\n # CRS) preserves that state through concat. GH#65094.\n orig_dtypes = self._mgr.get_dtypes()\n row_dtypes = row_df._mgr.get_dtypes()\n object_dtype = np.dtype(object)\n for i in range(len(self.columns)):\n orig_dtype = orig_dtypes[i]\n if row_dtypes[i] == orig_dtype:\n continue\n if orig_dtype == object_dtype:\n # concat object + anything stays object; post-cast is a\n # no-op, so pre-casting would only add overhead.\n continue\n arr = self._get_column_array(i)\n if isinstance(arr, np.ndarray):\n # infer_and_maybe_downcast expects an EA as its first\n # argument so it can dispatch to _cast_pointwise_result.\n arr = NumpyExtensionArray(arr)\n casted = infer_and_maybe_downcast(arr, row_df._mgr.iget_values(i))\n row_df.isetitem(i, casted)\n\n from pandas.core.reshape.concat import concat\n\n result = concat(\n [self, row_df],\n ignore_index=ignore_index,\n )\n return result.__finalize__(self, method=\"append\")\n\n def join(\n self,\n other: DataFrame | Series | Iterable[DataFrame | Series],\n on: IndexLabel | None = None,\n how: MergeHow = \"left\",\n lsuffix: str = \"\",\n rsuffix: str = \"\",\n sort: bool = False,\n validate: JoinValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Join columns of another DataFrame.\n\n Join columns with `other` DataFrame either on index or on a key\n column. Efficiently join multiple DataFrame objects by index at once by\n passing a list.\n\n Parameters\n ----------\n other : DataFrame, Series, or a list containing any combination of them\n Index should be similar to one of the columns in the caller. If a\n Series is passed, its name attribute must be set, and that will be\n used as the column name in the resulting joined DataFrame.\n on : str, list of str, or array-like, optional\n Column or index level name(s) in the caller to join on the index\n in `other`, otherwise joins index-on-index. If multiple\n values given, the `other` DataFrame must have a MultiIndex. Can\n pass an array as the join key if it is not already contained in\n the calling DataFrame. Like an Excel VLOOKUP operation.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'left'\n How to handle the operation of the two objects.\n\n * left: use calling frame's index (or column if on is specified)\n * right: use `other`'s index.\n * outer: form union of calling frame's index (or column if on is\n specified) with `other`'s index, and sort it lexicographically.\n * inner: form intersection of calling frame's index (or column if\n on is specified) with `other`'s index, preserving the order\n of the calling's one.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use set difference of calling frame's index and `other`'s\n index.\n * right_anti: use set difference of `other`'s index and calling frame's\n index.\n lsuffix : str, default ''\n Suffix to use from left frame's overlapping columns.\n rsuffix : str, default ''\n Suffix to use from right frame's overlapping columns.\n sort : bool, default False\n Order result DataFrame lexicographically by the join key. If False,\n the order of the join key depends on the join type (how keyword).\n validate : str, optional\n If specified, checks if join is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if join keys are unique in both left\n and right datasets.\n * \"one_to_many\" or \"1:m\": check if join keys are unique in left dataset.\n * \"many_to_one\" or \"m:1\": check if join keys are unique in right dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A dataframe containing columns from both the caller and `other`.\n\n See Also\n --------\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n Parameters `on`, `lsuffix`, and `rsuffix` are not supported when\n passing a list of `DataFrame` objects.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K2\", \"K3\", \"K4\", \"K5\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K2 A2\n 3 K3 A3\n 4 K4 A4\n 5 K5 A5\n\n >>> other = pd.DataFrame({\"key\": [\"K0\", \"K1\", \"K2\"], \"B\": [\"B0\", \"B1\", \"B2\"]})\n\n >>> other\n key B\n 0 K0 B0\n 1 K1 B1\n 2 K2 B2\n\n Join DataFrames using their indexes.\n\n >>> df.join(other, lsuffix=\"_caller\", rsuffix=\"_other\")\n key_caller A key_other B\n 0 K0 A0 K0 B0\n 1 K1 A1 K1 B1\n 2 K2 A2 K2 B2\n 3 K3 A3 NaN NaN\n 4 K4 A4 NaN NaN\n 5 K5 A5 NaN NaN\n\n If we want to join using the key columns, we need to set key to be\n the index in both `df` and `other`. The joined DataFrame will have\n key as its index.\n\n >>> df.set_index(\"key\").join(other.set_index(\"key\"))\n A B\n key\n K0 A0 B0\n K1 A1 B1\n K2 A2 B2\n K3 A3 NaN\n K4 A4 NaN\n K5 A5 NaN\n\n Another option to join using the key columns is to use the `on`\n parameter. DataFrame.join always uses `other`'s index but we can use\n any column in `df`. This method preserves the original DataFrame's\n index in the result.\n\n >>> df.join(other.set_index(\"key\"), on=\"key\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K2 A2 B2\n 3 K3 A3 NaN\n 4 K4 A4 NaN\n 5 K5 A5 NaN\n\n Using non-unique key values shows how they are matched.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K1\", \"K3\", \"K0\", \"K1\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K1 A2\n 3 K3 A3\n 4 K0 A4\n 5 K1 A5\n\n >>> df.join(other.set_index(\"key\"), on=\"key\", validate=\"m:1\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K1 A2 B1\n 3 K3 A3 NaN\n 4 K0 A4 B0\n 5 K1 A5 B1\n \"\"\"\n from pandas.core.reshape.concat import concat\n from pandas.core.reshape.merge import merge\n\n if isinstance(other, Series):\n if other.name is None:\n raise ValueError(\"Other Series must have a name\")\n other = DataFrame({other.name: other})\n\n if isinstance(other, DataFrame):\n if how == \"cross\":\n return merge(\n self,\n other,\n how=how,\n on=on,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n return merge(\n self,\n other,\n left_on=on,\n how=how,\n left_index=on is None,\n right_index=True,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n else:\n if on is not None:\n raise ValueError(\n \"Joining multiple DataFrames only supported for joining on index\"\n )\n\n if rsuffix or lsuffix:\n raise ValueError(\n \"Suffixes not supported when joining multiple DataFrames\"\n )\n\n # Mypy thinks the RHS is a\n # \"Union[DataFrame, Series, Iterable[Union[DataFrame, Series]]]\" whereas\n # the LHS is an \"Iterable[DataFrame]\", but in reality both types are\n # \"Iterable[Union[DataFrame, Series]]\" due to the if statements\n frames = [cast(\"DataFrame | Series\", self), *list(other)]\n\n can_concat = all(df.index.is_unique for df in frames)\n\n # join indexes only using concat\n if can_concat:\n if how in {\"left\", \"right\"}:\n res = concat(\n frames, axis=1, join=\"outer\", verify_integrity=True, sort=sort\n )\n index = self.index if how == \"left\" else frames[-1].index\n if sort:\n index = index.sort_values()\n result = res.reindex(index)\n return result\n else:\n if how == \"outer\":\n sort = True\n return concat(\n frames, axis=1, join=how, verify_integrity=True, sort=sort\n )\n\n joined = frames[0]\n\n for frame in frames[1:]:\n joined = merge(\n joined,\n frame,\n sort=sort,\n how=how,\n left_index=True,\n right_index=True,\n validate=validate,\n )\n\n return joined\n\n def merge(\n self,\n right: DataFrame | Series,\n how: MergeHow = \"inner\",\n on: IndexLabel | AnyArrayLike | None = None,\n left_on: IndexLabel | AnyArrayLike | None = None,\n right_on: IndexLabel | AnyArrayLike | None = None,\n left_index: bool = False,\n right_index: bool = False,\n sort: bool = False,\n suffixes: Suffixes = (\"_x\", \"_y\"),\n copy: bool | lib.NoDefault = lib.no_default,\n indicator: str | bool = False,\n validate: MergeValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Merge DataFrame or named Series objects with a database-style join.\n\n A named Series object is treated as a DataFrame with a single named column.\n\n The join is done on columns or indexes. If joining columns on\n columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes\n on indexes or indexes on a column or columns, the index will be passed on.\n When performing a cross merge, no column specifications to merge on are\n allowed.\n\n .. warning::\n\n If both key columns contain rows where the key is a null value, those\n rows will be matched against each other. This is different from usual SQL\n join behaviour and can lead to unexpected results.\n\n Parameters\n ----------\n right : DataFrame or named Series\n Object to merge with.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'inner'\n Type of merge to be performed.\n\n * left: use only keys from left frame, similar to a SQL left outer join;\n preserve key order.\n * right: use only keys from right frame, similar to a SQL right outer join;\n preserve key order.\n * outer: use union of keys from both frames, similar to a SQL full outer\n join; sort keys lexicographically.\n * inner: use intersection of keys from both frames, similar to a SQL inner\n join; preserve the order of the left keys.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use only keys from left frame that are not in right frame,\n similar to SQL left anti join; preserve key order.\n\n .. versionadded:: 3.0\n * right_anti: use only keys from right frame that are not in left frame,\n similar to SQL right anti join; preserve key order.\n\n .. versionadded:: 3.0\n on : Hashable or a sequence of the previous\n Column or index level names to join on. These must be found in both\n DataFrames. If `on` is None and not merging on indexes then this defaults\n to the intersection of the columns in both DataFrames.\n left_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the left DataFrame. Can also\n be an array or list of arrays of the length of the left DataFrame.\n These arrays are treated as if they are columns.\n right_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the right DataFrame. Can also\n be an array or list of arrays of the length of the right DataFrame.\n These arrays are treated as if they are columns.\n left_index : bool, default False\n Use the index from the left DataFrame as the join key(s). If it is a\n MultiIndex, the number of keys in the other DataFrame (either the index\n or a number of columns) must match the number of levels.\n right_index : bool, default False\n Use the index from the right DataFrame as the join key. Same caveats as\n left_index.\n sort : bool, default False\n Sort the join keys lexicographically in the result DataFrame. If False,\n the order of the join keys depends on the join type (how keyword).\n suffixes : list-like, default is (\"_x\", \"_y\")\n A length-2 sequence where each element is optionally a string\n indicating the suffix to add to overlapping column names in\n `left` and `right` respectively. Pass a value of `None` instead\n of a string to indicate that the column name from `left` or\n `right` should be left as-is, with no suffix. At least one of the\n values must not be None.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n indicator : bool or str, default False\n If True, adds a column to the output DataFrame called \"_merge\" with\n information on the source of each row. The column can be given a different\n name by providing a string argument. The column will have a Categorical\n type with the value of \"left_only\" for observations whose merge key only\n appears in the left DataFrame, \"right_only\" for observations\n whose merge key only appears in the right DataFrame, and \"both\"\n if the observation's merge key is found in both DataFrames.\n\n validate : str, optional\n If specified, checks if merge is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if merge keys are unique in both\n left and right datasets.\n * \"one_to_many\" or \"1:m\": check if merge keys are unique in left\n dataset.\n * \"many_to_one\" or \"m:1\": check if merge keys are unique in right\n dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A DataFrame of the two merged objects.\n\n See Also\n --------\n merge_ordered : Merge with optional filling/interpolation.\n merge_asof : Merge on nearest keys.\n DataFrame.join : Similar method using indices.\n\n Examples\n --------\n >>> df1 = pd.DataFrame(\n ... {\"lkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [1, 2, 3, 5]}\n ... )\n >>> df2 = pd.DataFrame(\n ... {\"rkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [5, 6, 7, 8]}\n ... )\n >>> df1\n lkey value\n 0 foo 1\n 1 bar 2\n 2 baz 3\n 3 foo 5\n >>> df2\n rkey value\n 0 foo 5\n 1 bar 6\n 2 baz 7\n 3 foo 8\n\n Merge df1 and df2 on the lkey and rkey columns. The value columns have\n the default suffixes, _x and _y, appended.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\")\n lkey value_x rkey value_y\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2 with specified left and right suffixes\n appended to any overlapping columns.\n\n >>> df1.merge(\n ... df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(\"_left\", \"_right\")\n ... )\n lkey value_left rkey value_right\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2, but raise an exception if the DataFrames have\n any overlapping columns.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(False, False))\n Traceback (most recent call last):\n ...\n ValueError: columns overlap but no suffix specified:\n Index(['value'], dtype='object')\n\n >>> df1 = pd.DataFrame({\"a\": [\"foo\", \"bar\"], \"b\": [1, 2]})\n >>> df2 = pd.DataFrame({\"a\": [\"foo\", \"baz\"], \"c\": [3, 4]})\n >>> df1\n a b\n 0 foo 1\n 1 bar 2\n >>> df2\n a c\n 0 foo 3\n 1 baz 4\n\n >>> df1.merge(df2, how=\"inner\", on=\"a\")\n a b c\n 0 foo 1 3\n\n >>> df1.merge(df2, how=\"left\", on=\"a\")\n a b c\n 0 foo 1 3.0\n 1 bar 2 NaN\n\n >>> df1 = pd.DataFrame({\"left\": [\"foo\", \"bar\"]})\n >>> df2 = pd.DataFrame({\"right\": [7, 8]})\n >>> df1\n left\n 0 foo\n 1 bar\n >>> df2\n right\n 0 7\n 1 8\n\n >>> df1.merge(df2, how=\"cross\")\n left right\n 0 foo 7\n 1 foo 8\n 2 bar 7\n 3 bar 8\n \"\"\"\n self._check_copy_deprecation(copy)\n\n from pandas.core.reshape.merge import merge\n\n return merge(\n self,\n right,\n how=how,\n on=on,\n left_on=left_on,\n right_on=right_on,\n left_index=left_index,\n right_index=right_index,\n sort=sort,\n suffixes=suffixes,\n indicator=indicator,\n validate=validate,\n )\n\n def round(\n self, decimals: int | dict[IndexLabel, int] | Series = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Round numeric columns in a DataFrame to a variable number of decimal places.\n\n Each column can be rounded to a different number of decimal places by\n passing a dict or Series mapping column names to the desired precision.\n Non-numeric columns are left unchanged.\n\n Parameters\n ----------\n decimals : int, dict, Series\n Number of decimal places to round each column to. If an int is\n given, round each column to the same number of places.\n Otherwise dict and Series round to variable numbers of places.\n Column names should be in the keys if `decimals` is a\n dict-like, or in the index if `decimals` is a Series. Any\n columns not included in `decimals` will be left as is. Elements\n of `decimals` which are not columns of the input will be\n ignored.\n *args\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n\n Returns\n -------\n DataFrame\n A DataFrame with the affected columns rounded to the specified\n number of decimal places.\n\n See Also\n --------\n numpy.around : Round a numpy array to the given number of decimals.\n Series.round : Round a Series to the given number of decimals.\n\n Notes\n -----\n For values exactly halfway between rounded decimal values, pandas rounds\n to the nearest even value (e.g. -0.5 and 0.5 round to 0.0, 1.5 and 2.5\n round to 2.0, etc.).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(0.21, 0.32), (0.01, 0.67), (0.66, 0.03), (0.21, 0.18)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df\n dogs cats\n 0 0.21 0.32\n 1 0.01 0.67\n 2 0.66 0.03\n 3 0.21 0.18\n\n By providing an integer each column is rounded to the same number\n of decimal places\n\n >>> df.round(1)\n dogs cats\n 0 0.2 0.3\n 1 0.0 0.7\n 2 0.7 0.0\n 3 0.2 0.2\n\n With a dict, the number of places for specific columns can be\n specified with the column names as key and the number of decimal\n places as value\n\n >>> df.round({\"dogs\": 1, \"cats\": 0})\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n\n Using a Series, the number of places for specific columns can be\n specified with the column names as index and the number of\n decimal places as value\n\n >>> decimals = pd.Series([0, 1], index=[\"cats\", \"dogs\"])\n >>> df.round(decimals)\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n \"\"\"\n from pandas.core.reshape.concat import concat\n\n def _dict_round(df: DataFrame, decimals) -> Iterator[Series]:\n for col, vals in df.items():\n try:\n yield _series_round(vals, decimals[col])\n except KeyError:\n yield vals\n\n def _series_round(ser: Series, decimals: int) -> Series:\n if is_integer_dtype(ser.dtype) or is_float_dtype(ser.dtype):\n return ser.round(decimals)\n elif isinstance(ser._values, (DatetimeArray, TimedeltaArray, PeriodArray)):\n # GH#57781\n # TODO: also the ArrowDtype analogues?\n warnings.warn(\n \"obj.round has no effect with datetime, timedelta, \"\n \"or period dtypes. Use obj.dt.round(...) instead.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n return ser\n\n nv.validate_round(args, kwargs)\n\n if isinstance(decimals, (dict, Series)):\n if isinstance(decimals, Series) and not decimals.index.is_unique:\n raise ValueError(\"Index of decimals must be unique\")\n if is_dict_like(decimals) and not all(\n is_integer(value) for _, value in decimals.items()\n ):\n raise TypeError(\"Values in decimals must be integers\")\n new_cols = list(_dict_round(self, decimals))\n elif is_integer(decimals):\n # Dispatch to Block.round\n # Argument \"decimals\" to \"round\" of \"BaseBlockManager\" has incompatible\n # type \"Union[int, integer[Any]]\"; expected \"int\"\n new_mgr = self._mgr.round(\n decimals=decimals, # type: ignore[arg-type]\n )\n return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(\n self, method=\"round\"\n )\n else:\n raise TypeError(\"decimals must be an integer, a dict-like or a Series\")\n\n if new_cols is not None and len(new_cols) > 0:\n return self._constructor(\n concat(new_cols, axis=1), index=self.index, columns=self.columns\n ).__finalize__(self, method=\"round\")\n else:\n return self.copy(deep=False)\n\n # ----------------------------------------------------------------------\n # Statistical methods, etc.\n\n def describe(\n self,\n percentiles=None,\n include=None,\n exclude=None,\n ) -> DataFrame:\n \"\"\"\n Generate descriptive statistics.\n\n Summarize the central tendency, dispersion, and shape of each\n analyzed column's distribution, excluding ``NaN`` values. By\n default only numeric columns are analyzed; pass ``include`` to\n also analyze non-numeric columns (or ``exclude`` to omit columns\n by dtype).\n\n Parameters\n ----------\n percentiles : list-like of numbers, optional\n The percentiles to include in the output. All should fall\n between 0 and 1. The default, ``None``, returns the 25th,\n 50th, and 75th percentiles.\n include : 'all', list-like of dtypes or None (default), optional\n Which column dtypes to include. Options:\n\n - ``'all'`` : Include all columns, including non-numeric ones.\n - list-like of dtypes : Limit the result to columns of the\n given dtypes, in the style of\n :meth:`DataFrame.select_dtypes` (e.g. ``include=[np.number]``\n or ``include=[\"category\"]``).\n - ``None`` (default) : Include only numeric columns, falling\n back to object and categorical columns if there are no\n numeric columns.\n exclude : list-like of dtypes or None (default), optional\n Column dtypes to omit from the result, in the style of\n :meth:`DataFrame.select_dtypes`. ``None`` (default) excludes\n nothing.\n\n Returns\n -------\n DataFrame\n Summary statistics of the DataFrame's columns.\n\n See Also\n --------\n Series.describe : Generate descriptive statistics of a Series.\n DataFrame.count : Count of non-NA observations per column.\n DataFrame.max : Maximum of the values in each column.\n DataFrame.min : Minimum of the values in each column.\n DataFrame.mean : Mean of the values.\n DataFrame.std : Standard deviation of the observations.\n DataFrame.select_dtypes : Subset of a DataFrame including/excluding\n columns based on their dtype.\n\n Notes\n -----\n For numeric columns, the result's index includes ``count``,\n ``mean``, ``std``, ``min``, ``max``, and the requested\n percentiles. By default the lower percentile is ``25`` and the\n upper is ``75``; the ``50`` percentile is the same as the median.\n\n For object columns, the result's index includes ``count``,\n ``unique``, ``top``, and ``freq``. The ``top`` is the most common\n value and ``freq`` is its count. If multiple values tie for the\n highest count, ``top`` is chosen arbitrarily from among them.\n\n With ``include='all'``, the result's index is the union of the\n per-dtype indices, with ``NaN`` for statistics that do not apply\n to a given column's dtype.\n\n Examples\n --------\n By default, only numeric columns are analyzed.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"categorical\": pd.Categorical([\"d\", \"e\", \"f\"]),\n ... \"numeric\": [1, 2, 3],\n ... \"object\": [\"a\", \"b\", \"c\"],\n ... }\n ... )\n >>> df.describe()\n numeric\n count 3.0\n mean 2.0\n std 1.0\n min 1.0\n 25% 1.5\n 50% 2.0\n 75% 2.5\n max 3.0\n\n All columns regardless of dtype.\n\n >>> df.describe(include=\"all\") # doctest: +SKIP\n categorical numeric object\n count 3 3.0 3\n unique 3 NaN 3\n top f NaN a\n freq 1 NaN 1\n mean NaN 2.0 NaN\n std NaN 1.0 NaN\n min NaN 1.0 NaN\n 25% NaN 1.5 NaN\n 50% NaN 2.0 NaN\n 75% NaN 2.5 NaN\n max NaN 3.0 NaN\n\n Restrict the result to a specific dtype.\n\n >>> df.describe(include=[\"category\"])\n categorical\n count 3\n unique 3\n top d\n freq 1\n\n Exclude a specific dtype.\n\n >>> df.describe(exclude=[np.number]) # doctest: +SKIP\n categorical object\n count 3 3\n unique 3 3\n top f a\n freq 1 1\n \"\"\"\n return super().describe(\n percentiles=percentiles, include=include, exclude=exclude\n )\n\n def corr(\n self,\n method: CorrelationMethod = \"pearson\",\n min_periods: int = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise correlation of columns, excluding NA/null values.\n\n The result is a symmetric DataFrame where each element represents\n the correlation coefficient between two columns. By default, the\n Pearson correlation is computed, but Kendall and Spearman methods\n as well as arbitrary callables are also supported.\n\n Parameters\n ----------\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float. Note that the returned matrix from corr\n will have 1 along the diagonals and will be symmetric\n regardless of the callable's behavior.\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n Correlation matrix.\n\n See Also\n --------\n DataFrame.corrwith : Compute pairwise correlation with another\n DataFrame or Series.\n Series.corr : Compute the correlation between two Series.\n\n Notes\n -----\n Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.\n\n * `Pearson correlation coefficient `_\n * `Kendall rank correlation coefficient `_\n * `Spearman's rank correlation coefficient `_\n\n Examples\n --------\n >>> def histogram_intersection(a, b):\n ... v = np.minimum(a, b).sum().round(decimals=1)\n ... return v\n >>> df = pd.DataFrame(\n ... [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df.corr(method=histogram_intersection)\n dogs cats\n dogs 1.0 0.3\n cats 0.3 1.0\n\n >>> df = pd.DataFrame(\n ... [(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.corr(min_periods=3)\n dogs cats\n dogs 1.0 NaN\n cats NaN 1.0\n \"\"\" # noqa: E501\n data = self._get_numeric_data() if numeric_only else self\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if method == \"pearson\":\n correl = libalgos.nancorr(mat, minp=min_periods)\n elif method == \"spearman\":\n correl = libalgos.nancorr_spearman(mat, minp=min_periods)\n elif method == \"kendall\" or callable(method):\n if min_periods is None:\n min_periods = 1\n mat = mat.T\n corrf = nanops.get_corr_func(method)\n K = len(cols)\n correl = np.empty((K, K), dtype=float)\n mask = np.isfinite(mat)\n for i, ac in enumerate(mat):\n for j, bc in enumerate(mat):\n if i > j:\n continue\n\n valid = mask[i] & mask[j]\n if valid.sum() < min_periods:\n c = np.nan\n elif i == j:\n c = 1.0\n elif not valid.all():\n c = corrf(ac[valid], bc[valid])\n else:\n c = corrf(ac, bc)\n correl[i, j] = c\n correl[j, i] = c\n else:\n raise ValueError(\n \"method must be either 'pearson', \"\n \"'spearman', 'kendall', or a callable, \"\n f\"'{method}' was supplied\"\n )\n\n result = self._constructor(correl, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"corr\")\n\n def cov(\n self,\n min_periods: int | None = None,\n ddof: int | None = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise covariance of columns, excluding NA/null values.\n\n Compute the pairwise covariance among the series of a DataFrame.\n The returned data frame is the `covariance matrix\n `__ of the columns\n of the DataFrame.\n\n Both NA and null values are automatically excluded from the\n calculation. (See the note below about bias from missing values.)\n A threshold can be set for the minimum number of\n observations for each value created. Comparisons with observations\n below this threshold will be returned as ``NaN``.\n\n This method is generally used for the analysis of time series data to\n understand the relationship between different measures\n across time.\n\n Parameters\n ----------\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result.\n\n ddof : int, default 1\n Delta degrees of freedom. The divisor used in calculations\n is ``N - ddof``, where ``N`` represents the number of elements.\n This argument is applicable only when no ``nan`` is in the dataframe.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n The covariance matrix of the series of the DataFrame.\n\n See Also\n --------\n Series.cov : Compute covariance with another Series.\n core.window.ewm.ExponentialMovingWindow.cov : Exponential weighted sample\n covariance.\n core.window.expanding.Expanding.cov : Expanding sample covariance.\n core.window.rolling.Rolling.cov : Rolling sample covariance.\n\n Notes\n -----\n Returns the covariance matrix of the DataFrame's time series.\n The covariance is normalized by N-ddof.\n\n For DataFrames that have Series that are missing data (assuming that\n data is `missing at random\n `__)\n the returned covariance matrix will be an unbiased estimate\n of the variance and covariance between the member Series.\n\n However, for many applications this estimate may not be acceptable\n because the estimate covariance matrix is not guaranteed to be positive\n semi-definite. This could lead to estimate correlations having\n absolute values which are greater than one, and/or a non-invertible\n covariance matrix. See `Estimation of covariance matrices\n `__ for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(1, 2), (0, 3), (2, 0), (1, 1)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.cov()\n dogs cats\n dogs 0.666667 -1.000000\n cats -1.000000 1.666667\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(\n ... np.random.randn(1000, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"]\n ... )\n >>> df.cov()\n a b c d e\n a 0.998438 -0.020161 0.059277 -0.008943 0.014144\n b -0.020161 1.059352 -0.008543 -0.024738 0.009826\n c 0.059277 -0.008543 1.010670 -0.001486 -0.000271\n d -0.008943 -0.024738 -0.001486 0.921297 -0.013692\n e 0.014144 0.009826 -0.000271 -0.013692 0.977795\n\n **Minimum number of periods**\n\n This method also supports an optional ``min_periods`` keyword\n that specifies the required minimum number of non-NA observations for\n each column pair in order to have a valid result:\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(np.random.randn(20, 3), columns=[\"a\", \"b\", \"c\"])\n >>> df.loc[df.index[:5], \"a\"] = np.nan\n >>> df.loc[df.index[5:10], \"b\"] = np.nan\n >>> df.cov(min_periods=12)\n a b c\n a 0.316741 NaN -0.150812\n b NaN 1.248003 0.191417\n c -0.150812 0.191417 0.895202\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n if any(blk.dtype.kind in \"mM\" for blk in self._mgr.blocks):\n msg = (\n \"DataFrame contains columns with dtype datetime64 \"\n \"or timedelta64, which are not supported for cov.\"\n )\n raise TypeError(msg)\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if notna(mat).all():\n if min_periods is not None and min_periods > len(mat):\n base_cov = np.empty((mat.shape[1], mat.shape[1]))\n base_cov.fill(np.nan)\n else:\n base_cov = np.cov(mat.T, ddof=ddof)\n base_cov = base_cov.reshape((len(cols), len(cols)))\n else:\n base_cov = libalgos.nancorr(mat, cov=True, minp=min_periods)\n\n result = self._constructor(base_cov, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"cov\")\n\n def corrwith(\n self,\n other: DataFrame | Series,\n axis: Axis = 0,\n drop: bool = False,\n method: CorrelationMethod = \"pearson\",\n numeric_only: bool = False,\n min_periods: int | None = None,\n ) -> Series:\n \"\"\"\n Compute pairwise correlation.\n\n Pairwise correlation is computed between rows or columns of\n DataFrame with rows or columns of Series or DataFrame. DataFrames\n are first aligned along both axes before computing the\n correlations.\n\n Parameters\n ----------\n other : DataFrame, Series\n Object with which to compute correlations.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' to compute row-wise, 1 or 'columns' for\n column-wise.\n drop : bool, default False\n Drop missing indices from result.\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n min_periods : int, optional\n Minimum number of observations needed to have a valid result.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n Series\n Pairwise correlations.\n\n See Also\n --------\n DataFrame.corr : Compute pairwise correlation of columns.\n\n Examples\n --------\n >>> index = [\"a\", \"b\", \"c\", \"d\", \"e\"]\n >>> columns = [\"one\", \"two\", \"three\", \"four\"]\n >>> df1 = pd.DataFrame(\n ... np.arange(20).reshape(5, 4), index=index, columns=columns\n ... )\n >>> df2 = pd.DataFrame(\n ... np.arange(16).reshape(4, 4), index=index[:4], columns=columns\n ... )\n >>> df1.corrwith(df2)\n one 1.0\n two 1.0\n three 1.0\n four 1.0\n dtype: float64\n\n >>> df2.corrwith(df1, axis=1)\n a 1.0\n b 1.0\n c 1.0\n d 1.0\n e NaN\n dtype: float64\n \"\"\"\n axis = self._get_axis_number(axis)\n this = self._get_numeric_data() if numeric_only else self\n\n if isinstance(other, Series):\n return this.apply(\n lambda x: other.corr(x, method=method, min_periods=min_periods),\n axis=axis,\n )\n\n if numeric_only:\n other = other._get_numeric_data()\n left, right = this.align(other, join=\"inner\")\n\n if axis == 1:\n left = left.T\n right = right.T\n\n if method == \"pearson\":\n # mask missing values\n left = left + right * 0\n right = right + left * 0\n\n # demeaned data\n ldem = left - left.mean(numeric_only=numeric_only)\n rdem = right - right.mean(numeric_only=numeric_only)\n\n num = (ldem * rdem).sum()\n dom = (\n (left.count() - 1)\n * left.std(numeric_only=numeric_only)\n * right.std(numeric_only=numeric_only)\n )\n\n correl = num / dom\n\n elif method in [\"kendall\", \"spearman\"] or callable(method):\n\n def c(x):\n return nanops.nancorr(x[0], x[1], method=method)\n\n correl = self._constructor_sliced(\n map(c, zip(left.values.T, right.values.T, strict=True)),\n index=left.columns,\n copy=False,\n )\n\n else:\n raise ValueError(\n f\"Invalid method {method} was passed, \"\n \"valid methods are: 'pearson', 'kendall', \"\n \"'spearman', or callable\"\n )\n\n if not drop:\n # Find non-matching labels along the given axis\n # and append missing correlations (GH 22375)\n raxis: AxisInt = 1 if axis == 0 else 0\n result_index = this._get_axis(raxis).union(other._get_axis(raxis))\n idx_diff = result_index.difference(correl.index)\n\n if len(idx_diff) > 0:\n correl = correl._append_internal(\n Series([np.nan] * len(idx_diff), index=idx_diff)\n )\n\n return correl\n\n # ----------------------------------------------------------------------\n # ndarray-like stats methods\n\n def count(self, axis: Axis = 0, numeric_only: bool = False) -> Series:\n \"\"\"\n Count non-NA cells for each column or row.\n\n The values `None`, `NaN`, `NaT`, ``pandas.NA`` are considered NA.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index' counts are generated for each column.\n If 1 or 'columns' counts are generated for each row.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n For each column/row the number of non-NA/null entries.\n\n See Also\n --------\n Series.count: Number of non-NA elements in a Series.\n DataFrame.value_counts: Count unique combinations of columns.\n DataFrame.shape: Number of DataFrame rows and columns (including NA\n elements).\n DataFrame.isna: Boolean same-sized DataFrame showing places of NA\n elements.\n\n Examples\n --------\n Constructing DataFrame from a dictionary:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Person\": [\"John\", \"Myla\", \"Lewis\", \"John\", \"Myla\"],\n ... \"Age\": [24.0, np.nan, 21.0, 33, 26],\n ... \"Single\": [False, True, True, True, False],\n ... }\n ... )\n >>> df\n Person Age Single\n 0 John 24.0 False\n 1 Myla NaN True\n 2 Lewis 21.0 True\n 3 John 33.0 True\n 4 Myla 26.0 False\n\n Notice the uncounted NA values:\n\n >>> df.count()\n Person 5\n Age 4\n Single 5\n dtype: int64\n\n Counts for each **row**:\n\n >>> df.count(axis=\"columns\")\n 0 3\n 1 2\n 2 3\n 3 3\n 4 3\n dtype: int64\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if numeric_only:\n frame = self._get_numeric_data()\n else:\n frame = self\n\n # GH #423\n if len(frame._get_axis(axis)) == 0:\n result = self._constructor_sliced(0, index=frame._get_agg_axis(axis))\n else:\n result = notna(frame).sum(axis=axis)\n\n return result.astype(\"int64\").__finalize__(self, method=\"count\")\n\n def _reduce(\n self,\n op,\n name: str,\n *,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n filter_type=None,\n **kwds,\n ):\n assert filter_type is None or filter_type == \"bool\", filter_type\n out_dtype = \"bool\" if filter_type == \"bool\" else None\n\n if axis is not None:\n axis = self._get_axis_number(axis)\n\n def func(values: np.ndarray):\n # We only use this in the case that operates on self.values\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def blk_func(values, axis: Axis = 1):\n if isinstance(values, ExtensionArray):\n if not is_1d_only_ea_dtype(values.dtype):\n return values._reduce(name, axis=1, skipna=skipna, **kwds)\n return values._reduce(name, skipna=skipna, keepdims=True, **kwds)\n else:\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def _get_data() -> DataFrame:\n if filter_type is None:\n data = self._get_numeric_data()\n else:\n # GH#25101, GH#24434\n assert filter_type == \"bool\"\n data = self._get_bool_data()\n return data\n\n # Case with EAs see GH#35881\n df = self\n if numeric_only:\n df = _get_data()\n if axis is None:\n dtype = find_common_type([block.values.dtype for block in df._mgr.blocks])\n if isinstance(dtype, ExtensionDtype):\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n return arr._reduce(name, skipna=skipna, keepdims=False, **kwds)\n return maybe_unbox_numpy_scalar(func(df.values))\n elif axis == 1:\n if len(df.index) == 0:\n # Taking a transpose would result in no columns, losing the dtype.\n # In the empty case, reducing along axis 0 or 1 gives the same\n # result dtype, so reduce with axis=0 and ignore values\n result = df._reduce(\n op,\n name,\n axis=0,\n skipna=skipna,\n numeric_only=False,\n filter_type=filter_type,\n **kwds,\n ).iloc[:0]\n result.index = df.index\n return result\n\n if df.shape[1]:\n # GH#51474: block-wise axis=1 reduction avoiding expensive\n # transpose for numpy-backed and 2D EA blocks.\n if (\n name in (\"sum\", \"prod\", \"min\", \"max\", \"any\", \"all\", \"mean\")\n and len(df._mgr.blocks) > 1\n and all(\n (isinstance(bv, np.ndarray) and bv.dtype.kind != \"O\")\n or (\n isinstance(bv, ExtensionArray)\n and bv.ndim == 2\n and name in (\"min\", \"max\")\n and skipna\n )\n for bv in (block.values for block in df._mgr.blocks)\n )\n ):\n return df._reduce_axis1(\n name,\n op,\n skipna=skipna,\n min_count=kwds.get(\"min_count\", 0),\n )\n dtype = find_common_type(\n [block.values.dtype for block in df._mgr.blocks]\n )\n if isinstance(dtype, ExtensionDtype):\n # GH 54341: fastpath for EA-backed axis=1 reductions\n # This flattens the frame into a single 1D array while keeping\n # track of the row and column indices of the original frame. Once\n # flattened, grouping by the row indices and aggregating should\n # be equivalent to transposing the original frame and aggregating\n # with axis=0.\n name = {\"argmax\": \"idxmax\", \"argmin\": \"idxmin\"}.get(name, name)\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n nrows, ncols = df.shape\n row_index = np.tile(np.arange(nrows), ncols)\n col_index = np.repeat(np.arange(ncols), nrows)\n ser = Series(arr, index=col_index, copy=False)\n if name == \"all\":\n # Behavior here appears incorrect; preserving\n # for backwards compatibility for now.\n # See https://github.com/pandas-dev/pandas/issues/57171\n skipna = True\n result = ser.groupby(row_index).agg(name, **kwds, skipna=skipna)\n result.index = df.index\n return result\n\n df = df.T\n\n # After possibly _get_data and transposing, we are now in the\n # simple case where we can use BlockManager.reduce\n res = df._mgr.reduce(blk_func)\n out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]\n out.name = None\n if out_dtype is not None and out.dtype != \"boolean\":\n out = out.astype(out_dtype)\n elif (df._mgr.get_dtypes() == object).any() and name not in [\"any\", \"all\"]:\n out = out.astype(object)\n\n return out\n\n def _reduce_axis1(\n self, name: str, func, skipna: bool, min_count: int = 0\n ) -> Series:\n \"\"\"\n Special case for _reduce to try to avoid a potentially-expensive transpose.\n\n Apply the reduction block-wise along axis=1 and then reduce the resulting\n 1D arrays.\n \"\"\"\n if name == \"all\":\n result = np.ones(len(self), dtype=bool)\n ufunc = np.logical_and\n elif name == \"any\":\n result = np.zeros(len(self), dtype=bool)\n # error: Incompatible types in assignment\n # (expression has type \"_UFunc_Nin2_Nout1[Literal['logical_or'],\n # Literal[20], Literal[False]]\", variable has type\n # \"_UFunc_Nin2_Nout1[Literal['logical_and'], Literal[20],\n # Literal[True]]\")\n ufunc = np.logical_or # type: ignore[assignment]\n elif name in (\"sum\", \"mean\"):\n result = None\n ufunc = np.add # type: ignore[assignment]\n elif name == \"prod\":\n result = None\n ufunc = np.multiply # type: ignore[assignment]\n elif name == \"min\":\n result = None\n ufunc = np.fmin if skipna else np.minimum # type: ignore[assignment]\n elif name == \"max\":\n result = None\n ufunc = np.fmax if skipna else np.maximum # type: ignore[assignment]\n else:\n raise NotImplementedError(name)\n\n for block in self._mgr.blocks:\n vals = block.values\n if name in (\"min\", \"max\"):\n middle = ufunc.reduce(vals, axis=0) # type: ignore[arg-type]\n elif name == \"mean\":\n middle = nanops.nansum(vals, axis=0, skipna=skipna, min_count=0) # type: ignore[arg-type]\n elif name in (\"sum\", \"prod\"):\n # min_count=0 here so each block produces a result;\n # the actual min_count threshold is applied across\n # all blocks after the loop.\n middle = func(vals, axis=0, skipna=skipna, min_count=0)\n else:\n middle = func(vals, axis=0, skipna=skipna)\n if result is None:\n result = middle.copy()\n else:\n result = ufunc(result, middle)\n\n # Handle min_count for sum/prod, and compute mean from sum/count\n if name in (\"sum\", \"prod\", \"mean\"):\n if (min_count > 0 or name == \"mean\") and result is not None:\n non_null_count = np.zeros(len(self), dtype=np.intp)\n for block in self._mgr.blocks:\n vals = block.values\n if vals.dtype.kind in \"biu\":\n # bool/int/uint cannot have NaN\n non_null_count += vals.shape[0]\n else:\n non_null_count += vals.shape[0] - isna(vals).sum(axis=0)\n if name == \"mean\":\n null_mask = non_null_count == 0\n result = result.astype(\"float64\")\n result[~null_mask] /= non_null_count[~null_mask]\n result[null_mask] = np.nan\n else:\n null_mask = non_null_count < min_count\n if null_mask.any():\n if result.dtype.kind not in \"fc\":\n result = result.astype(\"float64\")\n result[null_mask] = np.nan\n\n assert result is not None\n res_ser = self._constructor_sliced(result, index=self.index, copy=False)\n return res_ser\n\n # error: Signature of \"any\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def any(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def any(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def any(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n def any(\n self,\n *,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether any element is True, potentially over an axis.\n\n Returns False unless there is at least one element within a series or\n along a Dataframe axis that is True or equivalent (e.g. non-zero or\n non-empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be False, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n numpy.any : Numpy version of this method.\n Series.any : Return whether any element is True.\n Series.all : Return whether all elements are True.\n DataFrame.any : Return whether any element is True over requested axis.\n DataFrame.all : Return whether all elements are True over requested axis.\n\n Examples\n --------\n **Series**\n\n For Series input, the output is a scalar indicating whether any element\n is True.\n\n >>> pd.Series([False, False]).any()\n False\n >>> pd.Series([True, False]).any()\n True\n >>> pd.Series([], dtype=\"float64\").any()\n False\n >>> pd.Series([np.nan]).any()\n False\n >>> pd.Series([np.nan]).any(skipna=False)\n True\n\n **DataFrame**\n\n Whether each column contains at least one True element (the default).\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0, 2], \"C\": [0, 0]})\n >>> df\n A B C\n 0 1 0 0\n 1 2 2 0\n\n >>> df.any()\n A True\n B True\n C False\n dtype: bool\n\n Aggregating over the columns.\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 2]})\n >>> df\n A B\n 0 True 1\n 1 False 2\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 True\n dtype: bool\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 0]})\n >>> df\n A B\n 0 True 1\n 1 False 0\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Aggregating over the entire DataFrame with ``axis=None``.\n\n >>> df.any(axis=None)\n True\n\n `any` for an empty DataFrame is an empty Series.\n\n >>> pd.DataFrame([]).any()\n Series([], dtype: bool)\n \"\"\"\n result = self._logical_func(\n \"any\", nanops.nanany, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"any\")\n return result\n\n @overload\n def all(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def all(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def all(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"all\")\n def all(\n self,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether all elements are True, potentially over an axis.\n\n Returns True unless there at least one element within a series or\n along a Dataframe axis that is False or equivalent (e.g. zero or\n empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be True, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n Series.all : Return True if all elements are True.\n DataFrame.any : Return True if one (or more) elements are True.\n\n Examples\n --------\n **Series**\n\n >>> pd.Series([True, True]).all()\n True\n >>> pd.Series([True, False]).all()\n False\n >>> pd.Series([], dtype=\"float64\").all()\n True\n >>> pd.Series([np.nan]).all()\n True\n >>> pd.Series([np.nan]).all(skipna=False)\n True\n\n **DataFrames**\n\n Create a DataFrame from a dictionary.\n\n >>> df = pd.DataFrame({\"col1\": [True, True], \"col2\": [True, False]})\n >>> df\n col1 col2\n 0 True True\n 1 True False\n\n Default behaviour checks if values in each column all return True.\n\n >>> df.all()\n col1 True\n col2 False\n dtype: bool\n\n Specify ``axis='columns'`` to check if values in each row all return True.\n\n >>> df.all(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Or ``axis=None`` for whether every value is True.\n\n >>> df.all(axis=None)\n False\n \"\"\"\n result = self._logical_func(\n \"all\", nanops.nanall, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"all\")\n return result\n\n # error: Signature of \"min\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def min(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def min(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def min(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"min\")\n def min(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the minimum of the values over the requested axis.\n\n If you want the *index* of the minimum, use ``idxmin``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmin``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.min()\n 0\n \"\"\"\n result = super().min(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"min\")\n return result\n\n # error: Signature of \"max\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def max(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def max(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def max(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"max\")\n def max(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the maximum of the values over the requested axis.\n\n If you want the *index* of the maximum, use ``idxmax``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmax``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.max()\n 8\n \"\"\"\n result = super().max(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"max\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sum\")\n def sum(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the sum of the values over the requested axis.\n\n This is equivalent to the method ``numpy.sum``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sum with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Sum over requested axis.\n\n See Also\n --------\n Series.sum : Return the sum over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.std : Return the standard deviation of the values over the\n requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.sum()\n 14\n\n By default, the sum of an empty or all-NA Series is ``0``.\n\n >>> pd.Series([], dtype=\"float64\").sum() # min_count=0 is the default\n 0.0\n\n This can be controlled with the ``min_count`` parameter. For example, if\n you'd like the sum of an empty series to be NaN, pass ``min_count=1``.\n\n >>> pd.Series([], dtype=\"float64\").sum(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).sum()\n 0.0\n\n >>> pd.Series([np.nan]).sum(min_count=1)\n nan\n \"\"\"\n result = super().sum(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sum\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"prod\")\n def prod(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the product of the values over the requested axis.\n\n This multiplies all values in each column (or row when\n ``axis=1``) together, skipping missing values by default.\n An empty or all-NA column returns ``1`` unless ``min_count``\n is specified.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.prod with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n The product of the values over the requested axis.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n By default, the product of an empty or all-NA Series is ``1``\n\n >>> pd.Series([], dtype=\"float64\").prod()\n 1.0\n\n This can be controlled with the ``min_count`` parameter\n\n >>> pd.Series([], dtype=\"float64\").prod(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).prod()\n 1.0\n\n >>> pd.Series([np.nan]).prod(min_count=1)\n nan\n \"\"\"\n result = super().prod(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"prod\")\n return result\n\n # error: Signature of \"mean\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def mean(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def mean(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def mean(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"mean\")\n def mean(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the mean of the values over the requested axis.\n\n This computes the arithmetic mean of the values in each column\n (or row when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.mean()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.mean()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.mean(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.mean(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().mean(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"mean\")\n return result\n\n # error: Signature of \"median\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def median(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def median(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def median(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\"], name=\"median\"\n )\n def median(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the median of the values over the requested axis.\n\n This computes the median of the values in each column (or row\n when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.median()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.median()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.median(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.median(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().median(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"median\")\n return result\n\n # error: Signature of \"sem\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sem(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def sem(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def sem(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sem\")\n def sem(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased standard error of the mean over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sem with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series\n Unbiased standard error of the mean over requested axis.\n\n See Also\n --------\n DataFrame.var : Return unbiased variance over requested axis.\n DataFrame.std : Returns sample standard deviation over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> round(s.sem(), 6)\n 0.57735\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.sem()\n a 0.5\n b 0.5\n dtype: float64\n\n Using axis=1\n\n >>> df.sem(axis=1)\n tiger 0.5\n zebra 0.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.sem(numeric_only=True)\n a 0.5\n dtype: float64\n \"\"\"\n result = super().sem(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sem\")\n return result\n\n # error: Signature of \"var\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def var(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def var(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def var(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"var\")\n def var(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased variance over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.var with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series or scalaer\n Unbiased variance over requested axis.\n\n See Also\n --------\n numpy.var : Equivalent function in NumPy.\n Series.var : Return unbiased variance over Series values.\n Series.std : Return standard deviation over Series values.\n DataFrame.std : Return standard deviation of the values over\n the requested axis.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n >>> df.var()\n age 352.916667\n height 0.056367\n dtype: float64\n\n Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:\n\n >>> df.var(ddof=0)\n age 264.687500\n height 0.042275\n dtype: float64\n \"\"\"\n result = super().var(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"var\")\n return result\n\n # error: Signature of \"std\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def std(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def std(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def std(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"std\")\n def std(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return sample standard deviation over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.std with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs : dict\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Standard deviation over requested axis.\n\n See Also\n --------\n Series.std : Return standard deviation over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.sum : Return the sum of the values over the requested axis.\n\n Notes\n -----\n To have the same behaviour as ``numpy.std``, use ``ddof=0`` (instead of\n the default ``ddof=1``) and ``skipna=False``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n The standard deviation of the columns can be found as follows:\n\n >>> df.std()\n age 18.786076\n height 0.237417\n dtype: float64\n\n Alternatively, `ddof=0` can be set to normalize by N instead of N-1:\n\n >>> df.std(ddof=0)\n age 16.269219\n height 0.205609\n dtype: float64\n \"\"\"\n result = super().std(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"std\")\n return result\n\n # error: Signature of \"skew\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def skew(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def skew(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def skew(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"skew\")\n def skew(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased skew over requested axis.\n\n Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased skew over requested axis.\n\n See Also\n --------\n DataFrame.kurt : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.skew()\n 0.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [2, 3, 4], \"c\": [1, 3, 5]},\n ... index=[\"tiger\", \"zebra\", \"cow\"],\n ... )\n >>> df\n a b c\n tiger 1 2 1\n zebra 2 3 3\n cow 3 4 5\n >>> df.skew()\n a 0.0\n b 0.0\n c 0.0\n dtype: float64\n\n Using axis=1\n\n >>> df.skew(axis=1)\n tiger 1.732051\n zebra -1.732051\n cow 0.000000\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [\"T\", \"Z\", \"X\"]}, index=[\"tiger\", \"zebra\", \"cow\"]\n ... )\n >>> df.skew(numeric_only=True)\n a 0.0\n dtype: float64\n \"\"\"\n result = super().skew(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"skew\")\n return result\n\n # error: Signature of \"kurt\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def kurt(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"kurt\")\n def kurt(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased kurtosis over requested axis.\n\n Kurtosis obtained using Fisher's definition of\n kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased kurtosis over requested axis.\n\n See Also\n --------\n DataFrame.kurtosis : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 2, 3], index=[\"cat\", \"dog\", \"dog\", \"mouse\"])\n >>> s\n cat 1\n dog 2\n dog 2\n mouse 3\n dtype: int64\n >>> round(s.kurt(), 6)\n 1.5\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 2, 3], \"b\": [3, 4, 4, 4]},\n ... index=[\"cat\", \"dog\", \"dog\", \"mouse\"],\n ... )\n >>> df\n a b\n cat 1 3\n dog 2 4\n dog 2 4\n mouse 3 4\n >>> round(df.kurt(), 6)\n a 1.5\n b 4.0\n dtype: float64\n\n With axis=None\n\n >>> round(df.kurt(axis=None), 6)\n -0.988693\n\n Using axis=1\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2], \"b\": [3, 4], \"c\": [3, 4], \"d\": [1, 2]},\n ... index=[\"cat\", \"dog\"],\n ... )\n >>> df.kurt(axis=1)\n cat -6.0\n dog -6.0\n dtype: float64\n \"\"\"\n result = super().kurt(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"kurt\")\n return result\n\n # error: Incompatible types in assignment\n kurtosis = kurt # type: ignore[assignment]\n product = prod\n\n def cummin(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative minimum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n minimum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative minimum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.min : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.min : Return the minimum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummin()\n 0 2.0\n 1 NaN\n 2 2.0\n 3 -1.0\n 4 -1.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummin(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the minimum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummin()\n A B\n 0 2.0 1.0\n 1 2.0 NaN\n 2 1.0 0.0\n\n To iterate over columns and find the minimum in each row,\n use ``axis=1``\n\n >>> df.cummin(axis=1)\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummin(data, axis, skipna, *args, **kwargs)\n\n def cummax(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative maximum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n maximum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative maximum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.max : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.max : Return the maximum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummax()\n 0 2.0\n 1 NaN\n 2 5.0\n 3 5.0\n 4 5.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummax(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the maximum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummax()\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 3.0 1.0\n\n To iterate over columns and find the maximum in each row,\n use ``axis=1``\n\n >>> df.cummax(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummax(data, axis, skipna, *args, **kwargs)\n\n def cumsum(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative sum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n sum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative sum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.sum : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.sum : Return the sum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumsum()\n 0 2.0\n 1 NaN\n 2 7.0\n 3 6.0\n 4 6.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumsum(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the sum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumsum()\n A B\n 0 2.0 1.0\n 1 5.0 NaN\n 2 6.0 1.0\n\n To iterate over columns and find the sum in each row,\n use ``axis=1``\n\n >>> df.cumsum(axis=1)\n A B\n 0 2.0 3.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumsum(data, axis, skipna, *args, **kwargs)\n\n def cumprod(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative product over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n product.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative product of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.prod : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.prod : Return the product over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumprod()\n 0 2.0\n 1 NaN\n 2 10.0\n 3 -10.0\n 4 -0.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumprod(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the product\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumprod()\n A B\n 0 2.0 1.0\n 1 6.0 NaN\n 2 6.0 0.0\n\n To iterate over columns and find the product in each row,\n use ``axis=1``\n\n >>> df.cumprod(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumprod(data, axis, skipna, *args, **kwargs)\n\n def nunique(self, axis: Axis = 0, dropna: bool = True) -> Series:\n \"\"\"\n Count number of distinct elements in specified axis.\n\n Return Series with number of distinct elements. Can ignore NaN\n values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for\n column-wise.\n dropna : bool, default True\n Don't include NaN in the counts.\n\n Returns\n -------\n Series\n Series with counts of unique values per row or column, depending on `axis`.\n\n See Also\n --------\n Series.nunique: Method nunique for Series.\n DataFrame.count: Count non-NA cells for each column or row.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [4, 5, 6], \"B\": [4, 1, 1]})\n >>> df.nunique()\n A 3\n B 2\n dtype: int64\n\n >>> df.nunique(axis=1)\n 0 1\n 1 2\n 2 2\n dtype: int64\n \"\"\"\n return self.apply(Series.nunique, axis=axis, dropna=dropna)\n\n def idxmin(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of minimum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of minima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmin : Return index of the minimum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmin``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the minimum value in each column.\n\n >>> df.idxmin()\n consumption Pork\n co2_emissions Wheat Products\n dtype: str\n\n To return the index for the minimum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmin(axis=\"columns\")\n Pork consumption\n Wheat Products co2_emissions\n Beef consumption\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmin, \"argmin\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be np.ndarray since axis is not N\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmin\")\n\n def idxmax(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of maximum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of maxima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmax : Return index of the maximum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmax``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the maximum value in each column.\n\n >>> df.idxmax()\n consumption Wheat Products\n co2_emissions Beef\n dtype: str\n\n To return the index for the maximum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmax(axis=\"columns\")\n Pork co2_emissions\n Wheat Products consumption\n Beef co2_emissions\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmax, \"argmax\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be 1d array since axis is not None\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmax\")\n\n def _get_agg_axis(self, axis_num: int) -> Index:\n \"\"\"\n Let's be explicit about this.\n \"\"\"\n if axis_num == 0:\n return self.columns\n elif axis_num == 1:\n return self.index\n else:\n raise ValueError(f\"Axis must be 0 or 1 (got {axis_num!r})\")\n\n def mode(\n self, axis: Axis = 0, numeric_only: bool = False, dropna: bool = True\n ) -> DataFrame:\n \"\"\"\n Get the mode(s) of each element along the selected axis.\n\n The mode of a set of values is the value that appears most often.\n It can be multiple values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to iterate over while searching for the mode:\n\n * 0 or 'index' : get mode of each column\n * 1 or 'columns' : get mode of each row.\n\n numeric_only : bool, default False\n If True, only apply to numeric columns.\n dropna : bool, default True\n Don't consider counts of NaN/NaT.\n\n Returns\n -------\n DataFrame\n The modes of each column or row.\n\n See Also\n --------\n Series.mode : Return the highest frequency value in a Series.\n Series.value_counts : Return the counts of values in a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"bird\", 2, 2),\n ... (\"mammal\", 4, np.nan),\n ... (\"arthropod\", 8, 0),\n ... (\"bird\", 2, np.nan),\n ... ],\n ... index=(\"falcon\", \"horse\", \"spider\", \"ostrich\"),\n ... columns=(\"species\", \"legs\", \"wings\"),\n ... )\n >>> df\n species legs wings\n falcon bird 2 2.0\n horse mammal 4 NaN\n spider arthropod 8 0.0\n ostrich bird 2 NaN\n\n By default, missing values are not considered, and the mode of wings\n are both 0 and 2. Because the resulting DataFrame has two rows,\n the second row of ``species`` and ``legs`` contains ``NaN``.\n\n >>> df.mode()\n species legs wings\n 0 bird 2.0 0.0\n 1 NaN NaN 2.0\n\n Setting ``dropna=False`` ``NaN`` values are considered and they can be\n the mode (like for wings).\n\n >>> df.mode(dropna=False)\n species legs wings\n 0 bird 2 NaN\n\n Setting ``numeric_only=True``, only the mode of numeric columns is\n computed, and columns of other types are ignored.\n\n >>> df.mode(numeric_only=True)\n legs wings\n 0 2.0 0.0\n 1 NaN 2.0\n\n To compute the mode over columns and not rows, use the axis parameter:\n\n >>> df.mode(axis=\"columns\", numeric_only=True)\n 0 1\n falcon 2.0 NaN\n horse 4.0 NaN\n spider 0.0 8.0\n ostrich 2.0 NaN\n \"\"\"\n data = self if not numeric_only else self._get_numeric_data()\n\n def f(s):\n return s.mode(dropna=dropna)\n\n data = data.apply(f, axis=axis)\n # Ensure index is type stable (should always use int index)\n if data.empty:\n data.index = default_index(0)\n\n return data\n\n @overload\n def quantile(\n self,\n q: float = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series: ...\n\n @overload\n def quantile(\n self,\n q: AnyArrayLike | Sequence[float],\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n @overload\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = 0.5,\n axis: Axis = 0,\n numeric_only: bool = False,\n interpolation: QuantileInterpolation = \"linear\",\n method: Literal[\"single\", \"table\"] = \"single\",\n ) -> Series | DataFrame:\n \"\"\"\n Return values at the given quantile over requested axis.\n\n This method computes the value below which a given proportion of\n observations fall. By default, it computes quantiles column-wise,\n but row-wise computation is also supported via ``axis=1``.\n\n Parameters\n ----------\n q : float or array-like, default 0.5 (50% quantile)\n Value between 0 <= q <= 1, the quantile(s) to compute.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Equals 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}\n This optional parameter specifies the interpolation method to use,\n when the desired quantile lies between two data points `i` and `j`:\n\n * linear: `i + (j - i) * fraction`, where `fraction` is the\n fractional part of the index surrounded by `i` and `j`.\n * lower: `i`.\n * higher: `j`.\n * nearest: `i` or `j` whichever is nearest.\n * midpoint: (`i` + `j`) / 2.\n method : {'single', 'table'}, default 'single'\n Whether to compute quantiles per-column ('single') or over all columns\n ('table'). When 'table', the only allowed interpolation methods are\n 'nearest', 'lower', and 'higher'.\n\n Returns\n -------\n Series or DataFrame\n\n If ``q`` is an array, a DataFrame will be returned where the\n index is ``q``, the columns are the columns of self, and the\n values are the quantiles.\n If ``q`` is a float, a Series will be returned where the\n index is the columns of self and the values are the quantiles.\n\n See Also\n --------\n core.window.rolling.Rolling.quantile: Rolling quantile.\n numpy.percentile: Numpy function to compute the percentile.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=[\"a\", \"b\"]\n ... )\n >>> df.quantile(0.1)\n a 1.3\n b 3.7\n Name: 0.1, dtype: float64\n >>> df.quantile([0.1, 0.5])\n a b\n 0.1 1.3 3.7\n 0.5 2.5 55.0\n\n Specifying `method='table'` will compute the quantile over all columns.\n\n >>> df.quantile(0.1, method=\"table\", interpolation=\"nearest\")\n a 1\n b 1\n Name: 0.1, dtype: int64\n >>> df.quantile([0.1, 0.5], method=\"table\", interpolation=\"nearest\")\n a b\n 0.1 1 1\n 0.5 3 100\n\n Specifying `numeric_only=False` will compute the quantiles for all\n columns.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [1, 2],\n ... \"B\": [pd.Timestamp(\"2010\"), pd.Timestamp(\"2011\")],\n ... \"C\": [pd.Timedelta(\"1 days\"), pd.Timedelta(\"2 days\")],\n ... }\n ... )\n >>> df.quantile(0.5, numeric_only=False)\n A 1.5\n B 2010-07-02 12:00:00\n C 1 days 12:00:00\n Name: 0.5, dtype: object\n \"\"\"\n validate_percentile(q)\n axis = self._get_axis_number(axis)\n\n if not is_list_like(q):\n # BlockManager.quantile expects listlike, so we wrap and unwrap here\n # error: List item 0 has incompatible type \"float | ExtensionArray |\n # ndarray[Any, Any] | Index | Series | Sequence[float]\"; expected \"float\"\n res_df = self.quantile(\n [q], # type: ignore[list-item]\n axis=axis,\n numeric_only=numeric_only,\n interpolation=interpolation,\n method=method,\n )\n if method == \"single\":\n res = res_df.iloc[0]\n else:\n # cannot directly iloc over sparse arrays\n res = res_df.T.iloc[:, 0]\n if axis == 1 and len(self) == 0:\n # GH#41544 try to get an appropriate dtype\n dtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(dtype):\n return res.astype(dtype)\n return res\n\n q = Index(q, dtype=np.float64)\n data = self._get_numeric_data() if numeric_only else self\n\n if axis == 1:\n data = data.T\n\n if len(data.columns) == 0:\n # GH#23925 _get_numeric_data may have dropped all columns\n cols = self.columns[:0]\n\n dtype = np.float64\n if axis == 1:\n # GH#41544 try to get an appropriate dtype\n cdtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(cdtype):\n dtype = cdtype\n\n res = self._constructor([], index=q, columns=cols, dtype=dtype)\n return res.__finalize__(self, method=\"quantile\")\n\n valid_method = {\"single\", \"table\"}\n if method not in valid_method:\n raise ValueError(\n f\"Invalid method: {method}. Method must be in {valid_method}.\"\n )\n if method == \"single\":\n res = data._mgr.quantile(qs=q, interpolation=interpolation)\n elif method == \"table\":\n valid_interpolation = {\"nearest\", \"lower\", \"higher\"}\n if interpolation not in valid_interpolation:\n raise ValueError(\n f\"Invalid interpolation: {interpolation}. \"\n f\"Interpolation must be in {valid_interpolation}\"\n )\n # handle degenerate case\n if len(data) == 0:\n if data.ndim == 2:\n dtype = find_common_type(list(self.dtypes))\n else:\n dtype = self.dtype\n return self._constructor([], index=q, columns=data.columns, dtype=dtype)\n\n q_idx = np.quantile(np.arange(len(data)), q, method=interpolation)\n\n by = data.columns\n if len(by) > 1:\n keys = [data._get_label_or_level_values(x) for x in by]\n indexer = lexsort_indexer(keys)\n else:\n k = data._get_label_or_level_values(by[0])\n indexer = nargsort(k)\n\n res = data._mgr.take(indexer[q_idx], verify=False)\n res.axes[1] = q\n\n result = self._constructor_from_mgr(res, axes=res.axes)\n return result.__finalize__(self, method=\"quantile\")\n\n def to_timestamp(\n self,\n freq: Frequency | None = None,\n how: ToTimestampHow = \"start\",\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Cast PeriodIndex to DatetimeIndex of timestamps, at *beginning* of period.\n\n This can be changed to the *end* of the period, by specifying `how=\"e\"`.\n\n Parameters\n ----------\n freq : str, default frequency of PeriodIndex\n Desired frequency.\n how : {'s', 'e', 'start', 'end'}\n Convention for converting period to timestamp; start of period\n vs. end.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame with DatetimeIndex\n DataFrame with the PeriodIndex cast to DatetimeIndex.\n\n See Also\n --------\n DataFrame.to_period: Inverse method to cast DatetimeIndex to PeriodIndex.\n Series.to_timestamp: Equivalent method for Series.\n\n Examples\n --------\n >>> idx = pd.PeriodIndex([\"2023\", \"2024\"], freq=\"Y\")\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d, index=idx)\n >>> df1\n col1 col2\n 2023 1 3\n 2024\t 2 4\n\n The resulting timestamps will be at the beginning of the year in this case\n\n >>> df1 = df1.to_timestamp()\n >>> df1\n col1 col2\n 2023-01-01 1 3\n 2024-01-01 2 4\n >>> df1.index\n DatetimeIndex(['2023-01-01', '2024-01-01'], dtype='datetime64[us]', freq=None)\n\n Using `freq` which is the offset that the Timestamps will have\n\n >>> df2 = pd.DataFrame(data=d, index=idx)\n >>> df2 = df2.to_timestamp(freq=\"M\")\n >>> df2\n col1 col2\n 2023-01-31 1 3\n 2024-01-31 2 4\n >>> df2.index\n DatetimeIndex(['2023-01-31', '2024-01-31'], dtype='datetime64[us]', freq=None)\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, PeriodIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_timestamp(freq=freq, how=how)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def to_period(\n self,\n freq: Frequency | None = None,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Convert DataFrame from DatetimeIndex to PeriodIndex.\n\n Convert DataFrame from DatetimeIndex to PeriodIndex with desired\n frequency (inferred from index if not passed). Either index of columns can be\n converted, depending on `axis` argument.\n\n Parameters\n ----------\n freq : str, default\n Frequency of the PeriodIndex.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The DataFrame with the converted PeriodIndex.\n\n See Also\n --------\n Series.to_period: Equivalent method for Series.\n Series.dt.to_period: Convert DateTime column values.\n\n Examples\n --------\n >>> idx = pd.to_datetime(\n ... [\n ... \"2001-03-31 00:00:00\",\n ... \"2002-05-31 00:00:00\",\n ... \"2003-08-31 00:00:00\",\n ... ]\n ... )\n\n >>> idx\n DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],\n dtype='datetime64[us]', freq=None)\n\n >>> idx.to_period(\"M\")\n PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')\n\n For the yearly frequency\n\n >>> idx.to_period(\"Y\")\n PeriodIndex(['2001', '2002', '2003'], dtype='period[Y-DEC]')\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, DatetimeIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_period(freq=freq)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def isin(self, values: Series | DataFrame | Sequence | Mapping) -> DataFrame:\n \"\"\"\n Whether each element in the DataFrame is contained in values.\n\n Returns a DataFrame of the same shape with boolean values: True\n where the element is in the corresponding structure of\n ``values``, False otherwise. ``values`` can be a list, dict,\n Series, or DataFrame; alignment rules depend on its type.\n\n Parameters\n ----------\n values : iterable, Series, DataFrame or dict\n The result will only be true at a location if all the\n labels match. If `values` is a Series, that's the index. If\n `values` is a dict, the keys must be the column names,\n which must match. If `values` is a DataFrame,\n then both the index and column labels must match.\n\n Returns\n -------\n DataFrame\n DataFrame of booleans showing whether each element in the DataFrame\n is contained in values.\n\n See Also\n --------\n DataFrame.eq: Equality test for DataFrame.\n Series.isin: Equivalent method on Series.\n Series.str.contains: Test if pattern or regex is contained within a\n string of a Series or Index.\n\n Notes\n -----\n ``__iter__`` is used (and not ``__contains__``) to iterate over values\n when checking if it contains the elements in DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4], \"num_wings\": [2, 0]}, index=[\"falcon\", \"dog\"]\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n\n When ``values`` is a list check whether every value in the DataFrame\n is present in the list (which animals have 0 or 2 legs or wings)\n\n >>> df.isin([0, 2])\n num_legs num_wings\n falcon True True\n dog False True\n\n To check if ``values`` is *not* in the DataFrame, use the ``~`` operator:\n\n >>> ~df.isin([0, 2])\n num_legs num_wings\n falcon False False\n dog True False\n\n When ``values`` is a dict, we can pass values to check for each\n column separately:\n\n >>> df.isin({\"num_wings\": [0, 3]})\n num_legs num_wings\n falcon False False\n dog False True\n\n When ``values`` is a Series or DataFrame the index and column must\n match. Note that 'falcon' does not match based on the number of legs\n in other.\n\n >>> other = pd.DataFrame(\n ... {\"num_legs\": [8, 3], \"num_wings\": [0, 2]}, index=[\"spider\", \"falcon\"]\n ... )\n >>> df.isin(other)\n num_legs num_wings\n falcon False True\n dog False False\n \"\"\"\n if isinstance(values, dict):\n from pandas.core.reshape.concat import concat\n\n values = collections.defaultdict(list, values)\n result = concat(\n (\n self.iloc[:, [i]].isin(values[col])\n for i, col in enumerate(self.columns)\n ),\n axis=1,\n )\n elif isinstance(values, Series):\n if not values.index.is_unique:\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self), axis=\"index\")\n elif isinstance(values, DataFrame):\n if not (values.columns.is_unique and values.index.is_unique):\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self))\n else:\n if not is_list_like(values):\n raise TypeError(\n \"only list-like or dict-like objects are allowed \"\n \"to be passed to DataFrame.isin(), \"\n f\"you passed a '{type(values).__name__}'\"\n )\n\n def isin_(x):\n # error: Argument 2 to \"isin\" has incompatible type \"Union[Series,\n # DataFrame, Sequence[Any], Mapping[Any, Any]]\"; expected\n # \"Union[Union[Union[ExtensionArray, ndarray[Any, Any]], Index,\n # Series], List[Any], range]\"\n result = algorithms.isin(\n x.ravel(),\n values, # type: ignore[arg-type]\n )\n return result.reshape(x.shape)\n\n res_mgr = self._mgr.apply(isin_)\n result = self._constructor_from_mgr(\n res_mgr,\n axes=res_mgr.axes,\n )\n return result.__finalize__(self, method=\"isin\")\n\n # ----------------------------------------------------------------------\n # Add index and columns\n _AXIS_ORDERS: list[Literal[\"index\", \"columns\"]] = [\"index\", \"columns\"]\n _AXIS_TO_AXIS_NUMBER: dict[Axis, int] = {\n **NDFrame._AXIS_TO_AXIS_NUMBER,\n 1: 1,\n \"columns\": 1,\n }\n _AXIS_LEN = len(_AXIS_ORDERS)\n _info_axis_number: Literal[1] = 1\n _info_axis_name: Literal[\"columns\"] = \"columns\"\n\n index = properties.AxisProperty(\n axis=1,\n doc=\"\"\"\n The index (row labels) of the DataFrame.\n\n The index of a DataFrame is a series of labels that identify each row.\n The labels can be integers, strings, or any other hashable type. The index\n is used for label-based access and alignment, and can be accessed or\n modified using this attribute.\n\n Returns\n -------\n pandas.Index\n The index labels of the DataFrame.\n\n See Also\n --------\n DataFrame.columns : The column labels of the DataFrame.\n DataFrame.to_numpy : Convert the DataFrame to a NumPy array.\n\n Examples\n --------\n >>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],\n ... 'Age': [25, 30, 35],\n ... 'Location': ['Seattle', 'New York', 'Kona']},\n ... index=([10, 20, 30]))\n >>> df.index\n Index([10, 20, 30], dtype='int64')\n\n In this example, we create a DataFrame with 3 rows and 3 columns,\n including Name, Age, and Location information. We set the index labels to\n be the integers 10, 20, and 30. We then access the `index` attribute of the\n DataFrame, which returns an `Index` object containing the index labels.\n\n >>> df.index = [100, 200, 300]\n >>> df\n Name Age Location\n 100 Alice 25 Seattle\n 200 Bob 30 New York\n 300 Aritra 35 Kona\n\n In this example, we modify the index labels of the DataFrame by assigning\n a new list of labels to the `index` attribute. The DataFrame is then\n updated with the new labels, and the output shows the modified DataFrame.\n \"\"\",\n )\n columns = properties.AxisProperty(\n axis=0,\n doc=\"\"\"\n The column labels of the DataFrame.\n\n This property holds the column names as a pandas ``Index`` object.\n It provides an immutable sequence of column labels that can be\n used for data selection, renaming, and alignment in DataFrame operations.\n\n Returns\n -------\n pandas.Index\n The column labels of the DataFrame.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.axes: Return a list representing the axes of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})\n >>> df\n A B\n 0 1 3\n 1 2 4\n >>> df.columns\n Index(['A', 'B'], dtype='str')\n \"\"\",\n )\n\n # ----------------------------------------------------------------------\n # Add plotting methods to DataFrame\n plot = Accessor(\"plot\", pandas.plotting.PlotAccessor)\n hist = pandas.plotting.hist_frame\n boxplot = pandas.plotting.boxplot_frame\n sparse = Accessor(\"sparse\", SparseFrameAccessor)\n\n # ----------------------------------------------------------------------\n # Internal Interface Methods\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\n @property\n def values(self) -> np.ndarray:\n \"\"\"\n Return a Numpy representation of the DataFrame.\n\n .. warning::\n\n We recommend using :meth:`DataFrame.to_numpy` instead.\n ``.values`` offers no way to control the output ``dtype``, copy\n semantics, or the value used to fill missing entries, while\n :meth:`DataFrame.to_numpy` exposes those as the ``dtype``,\n ``copy``, and ``na_value`` arguments. The mutability of the\n result also depends on the DataFrame's internal block layout:\n when the DataFrame is backed by a single block the result is a\n read-only view (writes raise); when there are multiple blocks\n the result is a writable copy whose mutations do not propagate\n back to the DataFrame.\n\n Only the values in the DataFrame will be returned, the axes labels\n will be removed.\n\n Returns\n -------\n numpy.ndarray\n The values of the DataFrame.\n\n See Also\n --------\n DataFrame.to_numpy : Recommended alternative to this method.\n DataFrame.index : Retrieve the index labels.\n DataFrame.columns : Retrieving the column names.\n\n Notes\n -----\n The returned array is not intended to be written to. When the\n DataFrame is backed by a single NumPy array (single dtype, single\n block), the result is a read-only view; when the DataFrame has\n multiple internal blocks (e.g. after adding a new column), the\n result is a copy and modifications to it will not be reflected in\n the original DataFrame. Use :meth:`DataFrame.to_numpy` for more\n explicit control over copy behavior, or use :attr:`DataFrame.iloc`\n to modify values in-place.\n\n The dtype will be a lower-common-denominator dtype (implicit\n upcasting); that is to say if the dtypes (even of numeric types)\n are mixed, the one that accommodates all will be chosen. Use this\n with care if you are not dealing with the blocks.\n\n e.g. If the dtypes are float16 and float32, dtype will be upcast to\n float32. If dtypes are int32 and uint8, dtype will be upcast to\n int32. By :func:`numpy.find_common_type` convention, mixing int64\n and uint64 will result in a float64 dtype.\n\n Examples\n --------\n A DataFrame where all columns are the same type (e.g., int64) results\n in an array of the same type.\n\n >>> df = pd.DataFrame(\n ... {\"age\": [3, 29], \"height\": [94, 170], \"weight\": [31, 115]}\n ... )\n >>> df\n age height weight\n 0 3 94 31\n 1 29 170 115\n >>> df.dtypes\n age int64\n height int64\n weight int64\n dtype: object\n >>> df.values\n array([[ 3, 94, 31],\n [ 29, 170, 115]])\n\n A DataFrame with mixed type columns(e.g., str/object, int64, float32)\n results in an ndarray of the broadest type that accommodates these\n mixed types (e.g., object).\n\n >>> df2 = pd.DataFrame(\n ... [\n ... (\"parrot\", 24.0, \"second\"),\n ... (\"lion\", 80.5, 1),\n ... (\"monkey\", np.nan, None),\n ... ],\n ... columns=(\"name\", \"max_speed\", \"rank\"),\n ... )\n >>> df2.dtypes\n name str\n max_speed float64\n rank object\n dtype: object\n >>> df2.values\n array([['parrot', 24.0, 'second'],\n ['lion', 80.5, 1],\n ['monkey', nan, None]], dtype=object)\n\n ``DataFrame.to_numpy`` produces the same array by default, but lets\n you choose how missing values are represented and request a\n guaranteed copy:\n\n >>> df3 = pd.DataFrame({\"a\": [1, 2], \"b\": [3.0, np.nan]})\n >>> df3.values\n array([[ 1., 3.],\n [ 2., nan]])\n >>> df3.to_numpy(na_value=-1)\n array([[ 1., 3.],\n [ 2., -1.]])\n >>> df3.to_numpy(dtype=\"float32\", copy=True)\n array([[ 1., 3.],\n [ 2., nan]], dtype=float32)\n \"\"\"\n return self._mgr.as_array()\n\n\ndef _from_nested_dict(\n data: Mapping[HashableT, Mapping[HashableT2, T]],\n) -> collections.defaultdict[HashableT2, dict[HashableT, T]]:\n new_data: collections.defaultdict[HashableT2, dict[HashableT, T]] = (\n collections.defaultdict(dict)\n )\n for index, s in data.items():\n for col, v in s.items():\n new_data[col][index] = v\n return new_data\n\n\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n# InferHarness context needle: needle-late-256k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_256K_LATE\"\n# End InferHarness context needle\ne\n```\n
","tags":["context-window","needle-retrieval","python","late_80_percent","256k"],"expected_answer":"IH_NEEDLE_256K_LATE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":256000,"needle_position":"late_80_percent","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-two-facts-256k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-two-facts-256k\nApproximate target context: 256000 tokens; needle position: two_facts_20_and_80_percent.\nFind both Python benchmark needles for needle-two-facts-256k. Reply with the string assigned to BENCHMARK_NEEDLE_ALPHA, then a | separator, then the string assigned to BENCHMARK_NEEDLE_BETA, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n# InferHarness context needle: needle-two-facts-256k alpha\nBENCHMARK_NEEDLE_ALPHA = \"IH_NEEDLE_256K_ALPHA\"\n# End InferHarness context needle alpha\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n # test_append_empty_frame_to_series_with_dateutil_tz\n row_df = row_df.infer_objects().rename_axis(index.names)\n\n if len(row_df.columns) == len(self.columns):\n # Pre-cast the row's value to the original column dtype where the\n # row's inferred dtype would otherwise force concat to widen the\n # whole column. This avoids an O(N) materialize-and-rebuild\n # roundtrip in _post_expansion_casting, and (for EA dtypes that\n # carry array-level state not encoded in the dtype, e.g. geopandas\n # CRS) preserves that state through concat. GH#65094.\n orig_dtypes = self._mgr.get_dtypes()\n row_dtypes = row_df._mgr.get_dtypes()\n object_dtype = np.dtype(object)\n for i in range(len(self.columns)):\n orig_dtype = orig_dtypes[i]\n if row_dtypes[i] == orig_dtype:\n continue\n if orig_dtype == object_dtype:\n # concat object + anything stays object; post-cast is a\n # no-op, so pre-casting would only add overhead.\n continue\n arr = self._get_column_array(i)\n if isinstance(arr, np.ndarray):\n # infer_and_maybe_downcast expects an EA as its first\n # argument so it can dispatch to _cast_pointwise_result.\n arr = NumpyExtensionArray(arr)\n casted = infer_and_maybe_downcast(arr, row_df._mgr.iget_values(i))\n row_df.isetitem(i, casted)\n\n from pandas.core.reshape.concat import concat\n\n result = concat(\n [self, row_df],\n ignore_index=ignore_index,\n )\n return result.__finalize__(self, method=\"append\")\n\n def join(\n self,\n other: DataFrame | Series | Iterable[DataFrame | Series],\n on: IndexLabel | None = None,\n how: MergeHow = \"left\",\n lsuffix: str = \"\",\n rsuffix: str = \"\",\n sort: bool = False,\n validate: JoinValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Join columns of another DataFrame.\n\n Join columns with `other` DataFrame either on index or on a key\n column. Efficiently join multiple DataFrame objects by index at once by\n passing a list.\n\n Parameters\n ----------\n other : DataFrame, Series, or a list containing any combination of them\n Index should be similar to one of the columns in the caller. If a\n Series is passed, its name attribute must be set, and that will be\n used as the column name in the resulting joined DataFrame.\n on : str, list of str, or array-like, optional\n Column or index level name(s) in the caller to join on the index\n in `other`, otherwise joins index-on-index. If multiple\n values given, the `other` DataFrame must have a MultiIndex. Can\n pass an array as the join key if it is not already contained in\n the calling DataFrame. Like an Excel VLOOKUP operation.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'left'\n How to handle the operation of the two objects.\n\n * left: use calling frame's index (or column if on is specified)\n * right: use `other`'s index.\n * outer: form union of calling frame's index (or column if on is\n specified) with `other`'s index, and sort it lexicographically.\n * inner: form intersection of calling frame's index (or column if\n on is specified) with `other`'s index, preserving the order\n of the calling's one.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use set difference of calling frame's index and `other`'s\n index.\n * right_anti: use set difference of `other`'s index and calling frame's\n index.\n lsuffix : str, default ''\n Suffix to use from left frame's overlapping columns.\n rsuffix : str, default ''\n Suffix to use from right frame's overlapping columns.\n sort : bool, default False\n Order result DataFrame lexicographically by the join key. If False,\n the order of the join key depends on the join type (how keyword).\n validate : str, optional\n If specified, checks if join is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if join keys are unique in both left\n and right datasets.\n * \"one_to_many\" or \"1:m\": check if join keys are unique in left dataset.\n * \"many_to_one\" or \"m:1\": check if join keys are unique in right dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A dataframe containing columns from both the caller and `other`.\n\n See Also\n --------\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n Parameters `on`, `lsuffix`, and `rsuffix` are not supported when\n passing a list of `DataFrame` objects.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K2\", \"K3\", \"K4\", \"K5\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K2 A2\n 3 K3 A3\n 4 K4 A4\n 5 K5 A5\n\n >>> other = pd.DataFrame({\"key\": [\"K0\", \"K1\", \"K2\"], \"B\": [\"B0\", \"B1\", \"B2\"]})\n\n >>> other\n key B\n 0 K0 B0\n 1 K1 B1\n 2 K2 B2\n\n Join DataFrames using their indexes.\n\n >>> df.join(other, lsuffix=\"_caller\", rsuffix=\"_other\")\n key_caller A key_other B\n 0 K0 A0 K0 B0\n 1 K1 A1 K1 B1\n 2 K2 A2 K2 B2\n 3 K3 A3 NaN NaN\n 4 K4 A4 NaN NaN\n 5 K5 A5 NaN NaN\n\n If we want to join using the key columns, we need to set key to be\n the index in both `df` and `other`. The joined DataFrame will have\n key as its index.\n\n >>> df.set_index(\"key\").join(other.set_index(\"key\"))\n A B\n key\n K0 A0 B0\n K1 A1 B1\n K2 A2 B2\n K3 A3 NaN\n K4 A4 NaN\n K5 A5 NaN\n\n Another option to join using the key columns is to use the `on`\n parameter. DataFrame.join always uses `other`'s index but we can use\n any column in `df`. This method preserves the original DataFrame's\n index in the result.\n\n >>> df.join(other.set_index(\"key\"), on=\"key\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K2 A2 B2\n 3 K3 A3 NaN\n 4 K4 A4 NaN\n 5 K5 A5 NaN\n\n Using non-unique key values shows how they are matched.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K1\", \"K3\", \"K0\", \"K1\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K1 A2\n 3 K3 A3\n 4 K0 A4\n 5 K1 A5\n\n >>> df.join(other.set_index(\"key\"), on=\"key\", validate=\"m:1\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K1 A2 B1\n 3 K3 A3 NaN\n 4 K0 A4 B0\n 5 K1 A5 B1\n \"\"\"\n from pandas.core.reshape.concat import concat\n from pandas.core.reshape.merge import merge\n\n if isinstance(other, Series):\n if other.name is None:\n raise ValueError(\"Other Series must have a name\")\n other = DataFrame({other.name: other})\n\n if isinstance(other, DataFrame):\n if how == \"cross\":\n return merge(\n self,\n other,\n how=how,\n on=on,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n return merge(\n self,\n other,\n left_on=on,\n how=how,\n left_index=on is None,\n right_index=True,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n else:\n if on is not None:\n raise ValueError(\n \"Joining multiple DataFrames only supported for joining on index\"\n )\n\n if rsuffix or lsuffix:\n raise ValueError(\n \"Suffixes not supported when joining multiple DataFrames\"\n )\n\n # Mypy thinks the RHS is a\n # \"Union[DataFrame, Series, Iterable[Union[DataFrame, Series]]]\" whereas\n # the LHS is an \"Iterable[DataFrame]\", but in reality both types are\n # \"Iterable[Union[DataFrame, Series]]\" due to the if statements\n frames = [cast(\"DataFrame | Series\", self), *list(other)]\n\n can_concat = all(df.index.is_unique for df in frames)\n\n # join indexes only using concat\n if can_concat:\n if how in {\"left\", \"right\"}:\n res = concat(\n frames, axis=1, join=\"outer\", verify_integrity=True, sort=sort\n )\n index = self.index if how == \"left\" else frames[-1].index\n if sort:\n index = index.sort_values()\n result = res.reindex(index)\n return result\n else:\n if how == \"outer\":\n sort = True\n return concat(\n frames, axis=1, join=how, verify_integrity=True, sort=sort\n )\n\n joined = frames[0]\n\n for frame in frames[1:]:\n joined = merge(\n joined,\n frame,\n sort=sort,\n how=how,\n left_index=True,\n right_index=True,\n validate=validate,\n )\n\n return joined\n\n def merge(\n self,\n right: DataFrame | Series,\n how: MergeHow = \"inner\",\n on: IndexLabel | AnyArrayLike | None = None,\n left_on: IndexLabel | AnyArrayLike | None = None,\n right_on: IndexLabel | AnyArrayLike | None = None,\n left_index: bool = False,\n right_index: bool = False,\n sort: bool = False,\n suffixes: Suffixes = (\"_x\", \"_y\"),\n copy: bool | lib.NoDefault = lib.no_default,\n indicator: str | bool = False,\n validate: MergeValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Merge DataFrame or named Series objects with a database-style join.\n\n A named Series object is treated as a DataFrame with a single named column.\n\n The join is done on columns or indexes. If joining columns on\n columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes\n on indexes or indexes on a column or columns, the index will be passed on.\n When performing a cross merge, no column specifications to merge on are\n allowed.\n\n .. warning::\n\n If both key columns contain rows where the key is a null value, those\n rows will be matched against each other. This is different from usual SQL\n join behaviour and can lead to unexpected results.\n\n Parameters\n ----------\n right : DataFrame or named Series\n Object to merge with.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'inner'\n Type of merge to be performed.\n\n * left: use only keys from left frame, similar to a SQL left outer join;\n preserve key order.\n * right: use only keys from right frame, similar to a SQL right outer join;\n preserve key order.\n * outer: use union of keys from both frames, similar to a SQL full outer\n join; sort keys lexicographically.\n * inner: use intersection of keys from both frames, similar to a SQL inner\n join; preserve the order of the left keys.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use only keys from left frame that are not in right frame,\n similar to SQL left anti join; preserve key order.\n\n .. versionadded:: 3.0\n * right_anti: use only keys from right frame that are not in left frame,\n similar to SQL right anti join; preserve key order.\n\n .. versionadded:: 3.0\n on : Hashable or a sequence of the previous\n Column or index level names to join on. These must be found in both\n DataFrames. If `on` is None and not merging on indexes then this defaults\n to the intersection of the columns in both DataFrames.\n left_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the left DataFrame. Can also\n be an array or list of arrays of the length of the left DataFrame.\n These arrays are treated as if they are columns.\n right_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the right DataFrame. Can also\n be an array or list of arrays of the length of the right DataFrame.\n These arrays are treated as if they are columns.\n left_index : bool, default False\n Use the index from the left DataFrame as the join key(s). If it is a\n MultiIndex, the number of keys in the other DataFrame (either the index\n or a number of columns) must match the number of levels.\n right_index : bool, default False\n Use the index from the right DataFrame as the join key. Same caveats as\n left_index.\n sort : bool, default False\n Sort the join keys lexicographically in the result DataFrame. If False,\n the order of the join keys depends on the join type (how keyword).\n suffixes : list-like, default is (\"_x\", \"_y\")\n A length-2 sequence where each element is optionally a string\n indicating the suffix to add to overlapping column names in\n `left` and `right` respectively. Pass a value of `None` instead\n of a string to indicate that the column name from `left` or\n `right` should be left as-is, with no suffix. At least one of the\n values must not be None.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n indicator : bool or str, default False\n If True, adds a column to the output DataFrame called \"_merge\" with\n information on the source of each row. The column can be given a different\n name by providing a string argument. The column will have a Categorical\n type with the value of \"left_only\" for observations whose merge key only\n appears in the left DataFrame, \"right_only\" for observations\n whose merge key only appears in the right DataFrame, and \"both\"\n if the observation's merge key is found in both DataFrames.\n\n validate : str, optional\n If specified, checks if merge is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if merge keys are unique in both\n left and right datasets.\n * \"one_to_many\" or \"1:m\": check if merge keys are unique in left\n dataset.\n * \"many_to_one\" or \"m:1\": check if merge keys are unique in right\n dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A DataFrame of the two merged objects.\n\n See Also\n --------\n merge_ordered : Merge with optional filling/interpolation.\n merge_asof : Merge on nearest keys.\n DataFrame.join : Similar method using indices.\n\n Examples\n --------\n >>> df1 = pd.DataFrame(\n ... {\"lkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [1, 2, 3, 5]}\n ... )\n >>> df2 = pd.DataFrame(\n ... {\"rkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [5, 6, 7, 8]}\n ... )\n >>> df1\n lkey value\n 0 foo 1\n 1 bar 2\n 2 baz 3\n 3 foo 5\n >>> df2\n rkey value\n 0 foo 5\n 1 bar 6\n 2 baz 7\n 3 foo 8\n\n Merge df1 and df2 on the lkey and rkey columns. The value columns have\n the default suffixes, _x and _y, appended.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\")\n lkey value_x rkey value_y\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2 with specified left and right suffixes\n appended to any overlapping columns.\n\n >>> df1.merge(\n ... df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(\"_left\", \"_right\")\n ... )\n lkey value_left rkey value_right\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2, but raise an exception if the DataFrames have\n any overlapping columns.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(False, False))\n Traceback (most recent call last):\n ...\n ValueError: columns overlap but no suffix specified:\n Index(['value'], dtype='object')\n\n >>> df1 = pd.DataFrame({\"a\": [\"foo\", \"bar\"], \"b\": [1, 2]})\n >>> df2 = pd.DataFrame({\"a\": [\"foo\", \"baz\"], \"c\": [3, 4]})\n >>> df1\n a b\n 0 foo 1\n 1 bar 2\n >>> df2\n a c\n 0 foo 3\n 1 baz 4\n\n >>> df1.merge(df2, how=\"inner\", on=\"a\")\n a b c\n 0 foo 1 3\n\n >>> df1.merge(df2, how=\"left\", on=\"a\")\n a b c\n 0 foo 1 3.0\n 1 bar 2 NaN\n\n >>> df1 = pd.DataFrame({\"left\": [\"foo\", \"bar\"]})\n >>> df2 = pd.DataFrame({\"right\": [7, 8]})\n >>> df1\n left\n 0 foo\n 1 bar\n >>> df2\n right\n 0 7\n 1 8\n\n >>> df1.merge(df2, how=\"cross\")\n left right\n 0 foo 7\n 1 foo 8\n 2 bar 7\n 3 bar 8\n \"\"\"\n self._check_copy_deprecation(copy)\n\n from pandas.core.reshape.merge import merge\n\n return merge(\n self,\n right,\n how=how,\n on=on,\n left_on=left_on,\n right_on=right_on,\n left_index=left_index,\n right_index=right_index,\n sort=sort,\n suffixes=suffixes,\n indicator=indicator,\n validate=validate,\n )\n\n def round(\n self, decimals: int | dict[IndexLabel, int] | Series = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Round numeric columns in a DataFrame to a variable number of decimal places.\n\n Each column can be rounded to a different number of decimal places by\n passing a dict or Series mapping column names to the desired precision.\n Non-numeric columns are left unchanged.\n\n Parameters\n ----------\n decimals : int, dict, Series\n Number of decimal places to round each column to. If an int is\n given, round each column to the same number of places.\n Otherwise dict and Series round to variable numbers of places.\n Column names should be in the keys if `decimals` is a\n dict-like, or in the index if `decimals` is a Series. Any\n columns not included in `decimals` will be left as is. Elements\n of `decimals` which are not columns of the input will be\n ignored.\n *args\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n\n Returns\n -------\n DataFrame\n A DataFrame with the affected columns rounded to the specified\n number of decimal places.\n\n See Also\n --------\n numpy.around : Round a numpy array to the given number of decimals.\n Series.round : Round a Series to the given number of decimals.\n\n Notes\n -----\n For values exactly halfway between rounded decimal values, pandas rounds\n to the nearest even value (e.g. -0.5 and 0.5 round to 0.0, 1.5 and 2.5\n round to 2.0, etc.).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(0.21, 0.32), (0.01, 0.67), (0.66, 0.03), (0.21, 0.18)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df\n dogs cats\n 0 0.21 0.32\n 1 0.01 0.67\n 2 0.66 0.03\n 3 0.21 0.18\n\n By providing an integer each column is rounded to the same number\n of decimal places\n\n >>> df.round(1)\n dogs cats\n 0 0.2 0.3\n 1 0.0 0.7\n 2 0.7 0.0\n 3 0.2 0.2\n\n With a dict, the number of places for specific columns can be\n specified with the column names as key and the number of decimal\n places as value\n\n >>> df.round({\"dogs\": 1, \"cats\": 0})\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n\n Using a Series, the number of places for specific columns can be\n specified with the column names as index and the number of\n decimal places as value\n\n >>> decimals = pd.Series([0, 1], index=[\"cats\", \"dogs\"])\n >>> df.round(decimals)\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n \"\"\"\n from pandas.core.reshape.concat import concat\n\n def _dict_round(df: DataFrame, decimals) -> Iterator[Series]:\n for col, vals in df.items():\n try:\n yield _series_round(vals, decimals[col])\n except KeyError:\n yield vals\n\n def _series_round(ser: Series, decimals: int) -> Series:\n if is_integer_dtype(ser.dtype) or is_float_dtype(ser.dtype):\n return ser.round(decimals)\n elif isinstance(ser._values, (DatetimeArray, TimedeltaArray, PeriodArray)):\n # GH#57781\n # TODO: also the ArrowDtype analogues?\n warnings.warn(\n \"obj.round has no effect with datetime, timedelta, \"\n \"or period dtypes. Use obj.dt.round(...) instead.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n return ser\n\n nv.validate_round(args, kwargs)\n\n if isinstance(decimals, (dict, Series)):\n if isinstance(decimals, Series) and not decimals.index.is_unique:\n raise ValueError(\"Index of decimals must be unique\")\n if is_dict_like(decimals) and not all(\n is_integer(value) for _, value in decimals.items()\n ):\n raise TypeError(\"Values in decimals must be integers\")\n new_cols = list(_dict_round(self, decimals))\n elif is_integer(decimals):\n # Dispatch to Block.round\n # Argument \"decimals\" to \"round\" of \"BaseBlockManager\" has incompatible\n # type \"Union[int, integer[Any]]\"; expected \"int\"\n new_mgr = self._mgr.round(\n decimals=decimals, # type: ignore[arg-type]\n )\n return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(\n self, method=\"round\"\n )\n else:\n raise TypeError(\"decimals must be an integer, a dict-like or a Series\")\n\n if new_cols is not None and len(new_cols) > 0:\n return self._constructor(\n concat(new_cols, axis=1), index=self.index, columns=self.columns\n ).__finalize__(self, method=\"round\")\n else:\n return self.copy(deep=False)\n\n # ----------------------------------------------------------------------\n # Statistical methods, etc.\n\n def describe(\n self,\n percentiles=None,\n include=None,\n exclude=None,\n ) -> DataFrame:\n \"\"\"\n Generate descriptive statistics.\n\n Summarize the central tendency, dispersion, and shape of each\n analyzed column's distribution, excluding ``NaN`` values. By\n default only numeric columns are analyzed; pass ``include`` to\n also analyze non-numeric columns (or ``exclude`` to omit columns\n by dtype).\n\n Parameters\n ----------\n percentiles : list-like of numbers, optional\n The percentiles to include in the output. All should fall\n between 0 and 1. The default, ``None``, returns the 25th,\n 50th, and 75th percentiles.\n include : 'all', list-like of dtypes or None (default), optional\n Which column dtypes to include. Options:\n\n - ``'all'`` : Include all columns, including non-numeric ones.\n - list-like of dtypes : Limit the result to columns of the\n given dtypes, in the style of\n :meth:`DataFrame.select_dtypes` (e.g. ``include=[np.number]``\n or ``include=[\"category\"]``).\n - ``None`` (default) : Include only numeric columns, falling\n back to object and categorical columns if there are no\n numeric columns.\n exclude : list-like of dtypes or None (default), optional\n Column dtypes to omit from the result, in the style of\n :meth:`DataFrame.select_dtypes`. ``None`` (default) excludes\n nothing.\n\n Returns\n -------\n DataFrame\n Summary statistics of the DataFrame's columns.\n\n See Also\n --------\n Series.describe : Generate descriptive statistics of a Series.\n DataFrame.count : Count of non-NA observations per column.\n DataFrame.max : Maximum of the values in each column.\n DataFrame.min : Minimum of the values in each column.\n DataFrame.mean : Mean of the values.\n DataFrame.std : Standard deviation of the observations.\n DataFrame.select_dtypes : Subset of a DataFrame including/excluding\n columns based on their dtype.\n\n Notes\n -----\n For numeric columns, the result's index includes ``count``,\n ``mean``, ``std``, ``min``, ``max``, and the requested\n percentiles. By default the lower percentile is ``25`` and the\n upper is ``75``; the ``50`` percentile is the same as the median.\n\n For object columns, the result's index includes ``count``,\n ``unique``, ``top``, and ``freq``. The ``top`` is the most common\n value and ``freq`` is its count. If multiple values tie for the\n highest count, ``top`` is chosen arbitrarily from among them.\n\n With ``include='all'``, the result's index is the union of the\n per-dtype indices, with ``NaN`` for statistics that do not apply\n to a given column's dtype.\n\n Examples\n --------\n By default, only numeric columns are analyzed.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"categorical\": pd.Categorical([\"d\", \"e\", \"f\"]),\n ... \"numeric\": [1, 2, 3],\n ... \"object\": [\"a\", \"b\", \"c\"],\n ... }\n ... )\n >>> df.describe()\n numeric\n count 3.0\n mean 2.0\n std 1.0\n min 1.0\n 25% 1.5\n 50% 2.0\n 75% 2.5\n max 3.0\n\n All columns regardless of dtype.\n\n >>> df.describe(include=\"all\") # doctest: +SKIP\n categorical numeric object\n count 3 3.0 3\n unique 3 NaN 3\n top f NaN a\n freq 1 NaN 1\n mean NaN 2.0 NaN\n std NaN 1.0 NaN\n min NaN 1.0 NaN\n 25% NaN 1.5 NaN\n 50% NaN 2.0 NaN\n 75% NaN 2.5 NaN\n max NaN 3.0 NaN\n\n Restrict the result to a specific dtype.\n\n >>> df.describe(include=[\"category\"])\n categorical\n count 3\n unique 3\n top d\n freq 1\n\n Exclude a specific dtype.\n\n >>> df.describe(exclude=[np.number]) # doctest: +SKIP\n categorical object\n count 3 3\n unique 3 3\n top f a\n freq 1 1\n \"\"\"\n return super().describe(\n percentiles=percentiles, include=include, exclude=exclude\n )\n\n def corr(\n self,\n method: CorrelationMethod = \"pearson\",\n min_periods: int = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise correlation of columns, excluding NA/null values.\n\n The result is a symmetric DataFrame where each element represents\n the correlation coefficient between two columns. By default, the\n Pearson correlation is computed, but Kendall and Spearman methods\n as well as arbitrary callables are also supported.\n\n Parameters\n ----------\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float. Note that the returned matrix from corr\n will have 1 along the diagonals and will be symmetric\n regardless of the callable's behavior.\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n Correlation matrix.\n\n See Also\n --------\n DataFrame.corrwith : Compute pairwise correlation with another\n DataFrame or Series.\n Series.corr : Compute the correlation between two Series.\n\n Notes\n -----\n Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.\n\n * `Pearson correlation coefficient `_\n * `Kendall rank correlation coefficient `_\n * `Spearman's rank correlation coefficient `_\n\n Examples\n --------\n >>> def histogram_intersection(a, b):\n ... v = np.minimum(a, b).sum().round(decimals=1)\n ... return v\n >>> df = pd.DataFrame(\n ... [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df.corr(method=histogram_intersection)\n dogs cats\n dogs 1.0 0.3\n cats 0.3 1.0\n\n >>> df = pd.DataFrame(\n ... [(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.corr(min_periods=3)\n dogs cats\n dogs 1.0 NaN\n cats NaN 1.0\n \"\"\" # noqa: E501\n data = self._get_numeric_data() if numeric_only else self\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if method == \"pearson\":\n correl = libalgos.nancorr(mat, minp=min_periods)\n elif method == \"spearman\":\n correl = libalgos.nancorr_spearman(mat, minp=min_periods)\n elif method == \"kendall\" or callable(method):\n if min_periods is None:\n min_periods = 1\n mat = mat.T\n corrf = nanops.get_corr_func(method)\n K = len(cols)\n correl = np.empty((K, K), dtype=float)\n mask = np.isfinite(mat)\n for i, ac in enumerate(mat):\n for j, bc in enumerate(mat):\n if i > j:\n continue\n\n valid = mask[i] & mask[j]\n if valid.sum() < min_periods:\n c = np.nan\n elif i == j:\n c = 1.0\n elif not valid.all():\n c = corrf(ac[valid], bc[valid])\n else:\n c = corrf(ac, bc)\n correl[i, j] = c\n correl[j, i] = c\n else:\n raise ValueError(\n \"method must be either 'pearson', \"\n \"'spearman', 'kendall', or a callable, \"\n f\"'{method}' was supplied\"\n )\n\n result = self._constructor(correl, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"corr\")\n\n def cov(\n self,\n min_periods: int | None = None,\n ddof: int | None = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise covariance of columns, excluding NA/null values.\n\n Compute the pairwise covariance among the series of a DataFrame.\n The returned data frame is the `covariance matrix\n `__ of the columns\n of the DataFrame.\n\n Both NA and null values are automatically excluded from the\n calculation. (See the note below about bias from missing values.)\n A threshold can be set for the minimum number of\n observations for each value created. Comparisons with observations\n below this threshold will be returned as ``NaN``.\n\n This method is generally used for the analysis of time series data to\n understand the relationship between different measures\n across time.\n\n Parameters\n ----------\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result.\n\n ddof : int, default 1\n Delta degrees of freedom. The divisor used in calculations\n is ``N - ddof``, where ``N`` represents the number of elements.\n This argument is applicable only when no ``nan`` is in the dataframe.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n The covariance matrix of the series of the DataFrame.\n\n See Also\n --------\n Series.cov : Compute covariance with another Series.\n core.window.ewm.ExponentialMovingWindow.cov : Exponential weighted sample\n covariance.\n core.window.expanding.Expanding.cov : Expanding sample covariance.\n core.window.rolling.Rolling.cov : Rolling sample covariance.\n\n Notes\n -----\n Returns the covariance matrix of the DataFrame's time series.\n The covariance is normalized by N-ddof.\n\n For DataFrames that have Series that are missing data (assuming that\n data is `missing at random\n `__)\n the returned covariance matrix will be an unbiased estimate\n of the variance and covariance between the member Series.\n\n However, for many applications this estimate may not be acceptable\n because the estimate covariance matrix is not guaranteed to be positive\n semi-definite. This could lead to estimate correlations having\n absolute values which are greater than one, and/or a non-invertible\n covariance matrix. See `Estimation of covariance matrices\n `__ for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(1, 2), (0, 3), (2, 0), (1, 1)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.cov()\n dogs cats\n dogs 0.666667 -1.000000\n cats -1.000000 1.666667\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(\n ... np.random.randn(1000, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"]\n ... )\n >>> df.cov()\n a b c d e\n a 0.998438 -0.020161 0.059277 -0.008943 0.014144\n b -0.020161 1.059352 -0.008543 -0.024738 0.009826\n c 0.059277 -0.008543 1.010670 -0.001486 -0.000271\n d -0.008943 -0.024738 -0.001486 0.921297 -0.013692\n e 0.014144 0.009826 -0.000271 -0.013692 0.977795\n\n **Minimum number of periods**\n\n This method also supports an optional ``min_periods`` keyword\n that specifies the required minimum number of non-NA observations for\n each column pair in order to have a valid result:\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(np.random.randn(20, 3), columns=[\"a\", \"b\", \"c\"])\n >>> df.loc[df.index[:5], \"a\"] = np.nan\n >>> df.loc[df.index[5:10], \"b\"] = np.nan\n >>> df.cov(min_periods=12)\n a b c\n a 0.316741 NaN -0.150812\n b NaN 1.248003 0.191417\n c -0.150812 0.191417 0.895202\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n if any(blk.dtype.kind in \"mM\" for blk in self._mgr.blocks):\n msg = (\n \"DataFrame contains columns with dtype datetime64 \"\n \"or timedelta64, which are not supported for cov.\"\n )\n raise TypeError(msg)\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if notna(mat).all():\n if min_periods is not None and min_periods > len(mat):\n base_cov = np.empty((mat.shape[1], mat.shape[1]))\n base_cov.fill(np.nan)\n else:\n base_cov = np.cov(mat.T, ddof=ddof)\n base_cov = base_cov.reshape((len(cols), len(cols)))\n else:\n base_cov = libalgos.nancorr(mat, cov=True, minp=min_periods)\n\n result = self._constructor(base_cov, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"cov\")\n\n def corrwith(\n self,\n other: DataFrame | Series,\n axis: Axis = 0,\n drop: bool = False,\n method: CorrelationMethod = \"pearson\",\n numeric_only: bool = False,\n min_periods: int | None = None,\n ) -> Series:\n \"\"\"\n Compute pairwise correlation.\n\n Pairwise correlation is computed between rows or columns of\n DataFrame with rows or columns of Series or DataFrame. DataFrames\n are first aligned along both axes before computing the\n correlations.\n\n Parameters\n ----------\n other : DataFrame, Series\n Object with which to compute correlations.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' to compute row-wise, 1 or 'columns' for\n column-wise.\n drop : bool, default False\n Drop missing indices from result.\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n min_periods : int, optional\n Minimum number of observations needed to have a valid result.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n Series\n Pairwise correlations.\n\n See Also\n --------\n DataFrame.corr : Compute pairwise correlation of columns.\n\n Examples\n --------\n >>> index = [\"a\", \"b\", \"c\", \"d\", \"e\"]\n >>> columns = [\"one\", \"two\", \"three\", \"four\"]\n >>> df1 = pd.DataFrame(\n ... np.arange(20).reshape(5, 4), index=index, columns=columns\n ... )\n >>> df2 = pd.DataFrame(\n ... np.arange(16).reshape(4, 4), index=index[:4], columns=columns\n ... )\n >>> df1.corrwith(df2)\n one 1.0\n two 1.0\n three 1.0\n four 1.0\n dtype: float64\n\n >>> df2.corrwith(df1, axis=1)\n a 1.0\n b 1.0\n c 1.0\n d 1.0\n e NaN\n dtype: float64\n \"\"\"\n axis = self._get_axis_number(axis)\n this = self._get_numeric_data() if numeric_only else self\n\n if isinstance(other, Series):\n return this.apply(\n lambda x: other.corr(x, method=method, min_periods=min_periods),\n axis=axis,\n )\n\n if numeric_only:\n other = other._get_numeric_data()\n left, right = this.align(other, join=\"inner\")\n\n if axis == 1:\n left = left.T\n right = right.T\n\n if method == \"pearson\":\n # mask missing values\n left = left + right * 0\n right = right + left * 0\n\n # demeaned data\n ldem = left - left.mean(numeric_only=numeric_only)\n rdem = right - right.mean(numeric_only=numeric_only)\n\n num = (ldem * rdem).sum()\n dom = (\n (left.count() - 1)\n * left.std(numeric_only=numeric_only)\n * right.std(numeric_only=numeric_only)\n )\n\n correl = num / dom\n\n elif method in [\"kendall\", \"spearman\"] or callable(method):\n\n def c(x):\n return nanops.nancorr(x[0], x[1], method=method)\n\n correl = self._constructor_sliced(\n map(c, zip(left.values.T, right.values.T, strict=True)),\n index=left.columns,\n copy=False,\n )\n\n else:\n raise ValueError(\n f\"Invalid method {method} was passed, \"\n \"valid methods are: 'pearson', 'kendall', \"\n \"'spearman', or callable\"\n )\n\n if not drop:\n # Find non-matching labels along the given axis\n # and append missing correlations (GH 22375)\n raxis: AxisInt = 1 if axis == 0 else 0\n result_index = this._get_axis(raxis).union(other._get_axis(raxis))\n idx_diff = result_index.difference(correl.index)\n\n if len(idx_diff) > 0:\n correl = correl._append_internal(\n Series([np.nan] * len(idx_diff), index=idx_diff)\n )\n\n return correl\n\n # ----------------------------------------------------------------------\n # ndarray-like stats methods\n\n def count(self, axis: Axis = 0, numeric_only: bool = False) -> Series:\n \"\"\"\n Count non-NA cells for each column or row.\n\n The values `None`, `NaN`, `NaT`, ``pandas.NA`` are considered NA.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index' counts are generated for each column.\n If 1 or 'columns' counts are generated for each row.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n For each column/row the number of non-NA/null entries.\n\n See Also\n --------\n Series.count: Number of non-NA elements in a Series.\n DataFrame.value_counts: Count unique combinations of columns.\n DataFrame.shape: Number of DataFrame rows and columns (including NA\n elements).\n DataFrame.isna: Boolean same-sized DataFrame showing places of NA\n elements.\n\n Examples\n --------\n Constructing DataFrame from a dictionary:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Person\": [\"John\", \"Myla\", \"Lewis\", \"John\", \"Myla\"],\n ... \"Age\": [24.0, np.nan, 21.0, 33, 26],\n ... \"Single\": [False, True, True, True, False],\n ... }\n ... )\n >>> df\n Person Age Single\n 0 John 24.0 False\n 1 Myla NaN True\n 2 Lewis 21.0 True\n 3 John 33.0 True\n 4 Myla 26.0 False\n\n Notice the uncounted NA values:\n\n >>> df.count()\n Person 5\n Age 4\n Single 5\n dtype: int64\n\n Counts for each **row**:\n\n >>> df.count(axis=\"columns\")\n 0 3\n 1 2\n 2 3\n 3 3\n 4 3\n dtype: int64\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if numeric_only:\n frame = self._get_numeric_data()\n else:\n frame = self\n\n # GH #423\n if len(frame._get_axis(axis)) == 0:\n result = self._constructor_sliced(0, index=frame._get_agg_axis(axis))\n else:\n result = notna(frame).sum(axis=axis)\n\n return result.astype(\"int64\").__finalize__(self, method=\"count\")\n\n def _reduce(\n self,\n op,\n name: str,\n *,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n filter_type=None,\n **kwds,\n ):\n assert filter_type is None or filter_type == \"bool\", filter_type\n out_dtype = \"bool\" if filter_type == \"bool\" else None\n\n if axis is not None:\n axis = self._get_axis_number(axis)\n\n def func(values: np.ndarray):\n # We only use this in the case that operates on self.values\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def blk_func(values, axis: Axis = 1):\n if isinstance(values, ExtensionArray):\n if not is_1d_only_ea_dtype(values.dtype):\n return values._reduce(name, axis=1, skipna=skipna, **kwds)\n return values._reduce(name, skipna=skipna, keepdims=True, **kwds)\n else:\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def _get_data() -> DataFrame:\n if filter_type is None:\n data = self._get_numeric_data()\n else:\n # GH#25101, GH#24434\n assert filter_type == \"bool\"\n data = self._get_bool_data()\n return data\n\n # Case with EAs see GH#35881\n df = self\n if numeric_only:\n df = _get_data()\n if axis is None:\n dtype = find_common_type([block.values.dtype for block in df._mgr.blocks])\n if isinstance(dtype, ExtensionDtype):\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n return arr._reduce(name, skipna=skipna, keepdims=False, **kwds)\n return maybe_unbox_numpy_scalar(func(df.values))\n elif axis == 1:\n if len(df.index) == 0:\n # Taking a transpose would result in no columns, losing the dtype.\n # In the empty case, reducing along axis 0 or 1 gives the same\n # result dtype, so reduce with axis=0 and ignore values\n result = df._reduce(\n op,\n name,\n axis=0,\n skipna=skipna,\n numeric_only=False,\n filter_type=filter_type,\n **kwds,\n ).iloc[:0]\n result.index = df.index\n return result\n\n if df.shape[1]:\n # GH#51474: block-wise axis=1 reduction avoiding expensive\n # transpose for numpy-backed and 2D EA blocks.\n if (\n name in (\"sum\", \"prod\", \"min\", \"max\", \"any\", \"all\", \"mean\")\n and len(df._mgr.blocks) > 1\n and all(\n (isinstance(bv, np.ndarray) and bv.dtype.kind != \"O\")\n or (\n isinstance(bv, ExtensionArray)\n and bv.ndim == 2\n and name in (\"min\", \"max\")\n and skipna\n )\n for bv in (block.values for block in df._mgr.blocks)\n )\n ):\n return df._reduce_axis1(\n name,\n op,\n skipna=skipna,\n min_count=kwds.get(\"min_count\", 0),\n )\n dtype = find_common_type(\n [block.values.dtype for block in df._mgr.blocks]\n )\n if isinstance(dtype, ExtensionDtype):\n # GH 54341: fastpath for EA-backed axis=1 reductions\n # This flattens the frame into a single 1D array while keeping\n # track of the row and column indices of the original frame. Once\n # flattened, grouping by the row indices and aggregating should\n # be equivalent to transposing the original frame and aggregating\n # with axis=0.\n name = {\"argmax\": \"idxmax\", \"argmin\": \"idxmin\"}.get(name, name)\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n nrows, ncols = df.shape\n row_index = np.tile(np.arange(nrows), ncols)\n col_index = np.repeat(np.arange(ncols), nrows)\n ser = Series(arr, index=col_index, copy=False)\n if name == \"all\":\n # Behavior here appears incorrect; preserving\n # for backwards compatibility for now.\n # See https://github.com/pandas-dev/pandas/issues/57171\n skipna = True\n result = ser.groupby(row_index).agg(name, **kwds, skipna=skipna)\n result.index = df.index\n return result\n\n df = df.T\n\n # After possibly _get_data and transposing, we are now in the\n # simple case where we can use BlockManager.reduce\n res = df._mgr.reduce(blk_func)\n out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]\n out.name = None\n if out_dtype is not None and out.dtype != \"boolean\":\n out = out.astype(out_dtype)\n elif (df._mgr.get_dtypes() == object).any() and name not in [\"any\", \"all\"]:\n out = out.astype(object)\n\n return out\n\n def _reduce_axis1(\n self, name: str, func, skipna: bool, min_count: int = 0\n ) -> Series:\n \"\"\"\n Special case for _reduce to try to avoid a potentially-expensive transpose.\n\n Apply the reduction block-wise along axis=1 and then reduce the resulting\n 1D arrays.\n \"\"\"\n if name == \"all\":\n result = np.ones(len(self), dtype=bool)\n ufunc = np.logical_and\n elif name == \"any\":\n result = np.zeros(len(self), dtype=bool)\n # error: Incompatible types in assignment\n # (expression has type \"_UFunc_Nin2_Nout1[Literal['logical_or'],\n # Literal[20], Literal[False]]\", variable has type\n # \"_UFunc_Nin2_Nout1[Literal['logical_and'], Literal[20],\n # Literal[True]]\")\n ufunc = np.logical_or # type: ignore[assignment]\n elif name in (\"sum\", \"mean\"):\n result = None\n ufunc = np.add # type: ignore[assignment]\n elif name == \"prod\":\n result = None\n ufunc = np.multiply # type: ignore[assignment]\n elif name == \"min\":\n result = None\n ufunc = np.fmin if skipna else np.minimum # type: ignore[assignment]\n elif name == \"max\":\n result = None\n ufunc = np.fmax if skipna else np.maximum # type: ignore[assignment]\n else:\n raise NotImplementedError(name)\n\n for block in self._mgr.blocks:\n vals = block.values\n if name in (\"min\", \"max\"):\n middle = ufunc.reduce(vals, axis=0) # type: ignore[arg-type]\n elif name == \"mean\":\n middle = nanops.nansum(vals, axis=0, skipna=skipna, min_count=0) # type: ignore[arg-type]\n elif name in (\"sum\", \"prod\"):\n # min_count=0 here so each block produces a result;\n # the actual min_count threshold is applied across\n # all blocks after the loop.\n middle = func(vals, axis=0, skipna=skipna, min_count=0)\n else:\n middle = func(vals, axis=0, skipna=skipna)\n if result is None:\n result = middle.copy()\n else:\n result = ufunc(result, middle)\n\n # Handle min_count for sum/prod, and compute mean from sum/count\n if name in (\"sum\", \"prod\", \"mean\"):\n if (min_count > 0 or name == \"mean\") and result is not None:\n non_null_count = np.zeros(len(self), dtype=np.intp)\n for block in self._mgr.blocks:\n vals = block.values\n if vals.dtype.kind in \"biu\":\n # bool/int/uint cannot have NaN\n non_null_count += vals.shape[0]\n else:\n non_null_count += vals.shape[0] - isna(vals).sum(axis=0)\n if name == \"mean\":\n null_mask = non_null_count == 0\n result = result.astype(\"float64\")\n result[~null_mask] /= non_null_count[~null_mask]\n result[null_mask] = np.nan\n else:\n null_mask = non_null_count < min_count\n if null_mask.any():\n if result.dtype.kind not in \"fc\":\n result = result.astype(\"float64\")\n result[null_mask] = np.nan\n\n assert result is not None\n res_ser = self._constructor_sliced(result, index=self.index, copy=False)\n return res_ser\n\n # error: Signature of \"any\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def any(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def any(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def any(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n def any(\n self,\n *,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether any element is True, potentially over an axis.\n\n Returns False unless there is at least one element within a series or\n along a Dataframe axis that is True or equivalent (e.g. non-zero or\n non-empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be False, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n numpy.any : Numpy version of this method.\n Series.any : Return whether any element is True.\n Series.all : Return whether all elements are True.\n DataFrame.any : Return whether any element is True over requested axis.\n DataFrame.all : Return whether all elements are True over requested axis.\n\n Examples\n --------\n **Series**\n\n For Series input, the output is a scalar indicating whether any element\n is True.\n\n >>> pd.Series([False, False]).any()\n False\n >>> pd.Series([True, False]).any()\n True\n >>> pd.Series([], dtype=\"float64\").any()\n False\n >>> pd.Series([np.nan]).any()\n False\n >>> pd.Series([np.nan]).any(skipna=False)\n True\n\n **DataFrame**\n\n Whether each column contains at least one True element (the default).\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0, 2], \"C\": [0, 0]})\n >>> df\n A B C\n 0 1 0 0\n 1 2 2 0\n\n >>> df.any()\n A True\n B True\n C False\n dtype: bool\n\n Aggregating over the columns.\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 2]})\n >>> df\n A B\n 0 True 1\n 1 False 2\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 True\n dtype: bool\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 0]})\n >>> df\n A B\n 0 True 1\n 1 False 0\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Aggregating over the entire DataFrame with ``axis=None``.\n\n >>> df.any(axis=None)\n True\n\n `any` for an empty DataFrame is an empty Series.\n\n >>> pd.DataFrame([]).any()\n Series([], dtype: bool)\n \"\"\"\n result = self._logical_func(\n \"any\", nanops.nanany, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"any\")\n return result\n\n @overload\n def all(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def all(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def all(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"all\")\n def all(\n self,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether all elements are True, potentially over an axis.\n\n Returns True unless there at least one element within a series or\n along a Dataframe axis that is False or equivalent (e.g. zero or\n empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be True, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n Series.all : Return True if all elements are True.\n DataFrame.any : Return True if one (or more) elements are True.\n\n Examples\n --------\n **Series**\n\n >>> pd.Series([True, True]).all()\n True\n >>> pd.Series([True, False]).all()\n False\n >>> pd.Series([], dtype=\"float64\").all()\n True\n >>> pd.Series([np.nan]).all()\n True\n >>> pd.Series([np.nan]).all(skipna=False)\n True\n\n **DataFrames**\n\n Create a DataFrame from a dictionary.\n\n >>> df = pd.DataFrame({\"col1\": [True, True], \"col2\": [True, False]})\n >>> df\n col1 col2\n 0 True True\n 1 True False\n\n Default behaviour checks if values in each column all return True.\n\n >>> df.all()\n col1 True\n col2 False\n dtype: bool\n\n Specify ``axis='columns'`` to check if values in each row all return True.\n\n >>> df.all(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Or ``axis=None`` for whether every value is True.\n\n >>> df.all(axis=None)\n False\n \"\"\"\n result = self._logical_func(\n \"all\", nanops.nanall, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"all\")\n return result\n\n # error: Signature of \"min\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def min(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def min(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def min(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"min\")\n def min(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the minimum of the values over the requested axis.\n\n If you want the *index* of the minimum, use ``idxmin``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmin``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.min()\n 0\n \"\"\"\n result = super().min(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"min\")\n return result\n\n # error: Signature of \"max\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def max(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def max(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def max(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"max\")\n def max(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the maximum of the values over the requested axis.\n\n If you want the *index* of the maximum, use ``idxmax``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmax``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.max()\n 8\n \"\"\"\n result = super().max(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"max\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sum\")\n def sum(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the sum of the values over the requested axis.\n\n This is equivalent to the method ``numpy.sum``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sum with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Sum over requested axis.\n\n See Also\n --------\n Series.sum : Return the sum over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.std : Return the standard deviation of the values over the\n requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.sum()\n 14\n\n By default, the sum of an empty or all-NA Series is ``0``.\n\n >>> pd.Series([], dtype=\"float64\").sum() # min_count=0 is the default\n 0.0\n\n This can be controlled with the ``min_count`` parameter. For example, if\n you'd like the sum of an empty series to be NaN, pass ``min_count=1``.\n\n >>> pd.Series([], dtype=\"float64\").sum(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).sum()\n 0.0\n\n >>> pd.Series([np.nan]).sum(min_count=1)\n nan\n \"\"\"\n result = super().sum(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sum\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"prod\")\n def prod(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the product of the values over the requested axis.\n\n This multiplies all values in each column (or row when\n ``axis=1``) together, skipping missing values by default.\n An empty or all-NA column returns ``1`` unless ``min_count``\n is specified.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.prod with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n The product of the values over the requested axis.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n By default, the product of an empty or all-NA Series is ``1``\n\n >>> pd.Series([], dtype=\"float64\").prod()\n 1.0\n\n This can be controlled with the ``min_count`` parameter\n\n >>> pd.Series([], dtype=\"float64\").prod(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).prod()\n 1.0\n\n >>> pd.Series([np.nan]).prod(min_count=1)\n nan\n \"\"\"\n result = super().prod(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"prod\")\n return result\n\n # error: Signature of \"mean\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def mean(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def mean(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def mean(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"mean\")\n def mean(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the mean of the values over the requested axis.\n\n This computes the arithmetic mean of the values in each column\n (or row when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.mean()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.mean()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.mean(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.mean(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().mean(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"mean\")\n return result\n\n # error: Signature of \"median\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def median(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def median(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def median(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\"], name=\"median\"\n )\n def median(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the median of the values over the requested axis.\n\n This computes the median of the values in each column (or row\n when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.median()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.median()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.median(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.median(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().median(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"median\")\n return result\n\n # error: Signature of \"sem\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sem(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def sem(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def sem(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sem\")\n def sem(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased standard error of the mean over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sem with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series\n Unbiased standard error of the mean over requested axis.\n\n See Also\n --------\n DataFrame.var : Return unbiased variance over requested axis.\n DataFrame.std : Returns sample standard deviation over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> round(s.sem(), 6)\n 0.57735\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.sem()\n a 0.5\n b 0.5\n dtype: float64\n\n Using axis=1\n\n >>> df.sem(axis=1)\n tiger 0.5\n zebra 0.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.sem(numeric_only=True)\n a 0.5\n dtype: float64\n \"\"\"\n result = super().sem(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sem\")\n return result\n\n # error: Signature of \"var\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def var(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def var(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def var(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"var\")\n def var(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased variance over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.var with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series or scalaer\n Unbiased variance over requested axis.\n\n See Also\n --------\n numpy.var : Equivalent function in NumPy.\n Series.var : Return unbiased variance over Series values.\n Series.std : Return standard deviation over Series values.\n DataFrame.std : Return standard deviation of the values over\n the requested axis.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n >>> df.var()\n age 352.916667\n height 0.056367\n dtype: float64\n\n Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:\n\n >>> df.var(ddof=0)\n age 264.687500\n height 0.042275\n dtype: float64\n \"\"\"\n result = super().var(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"var\")\n return result\n\n # error: Signature of \"std\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def std(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def std(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def std(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"std\")\n def std(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return sample standard deviation over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.std with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs : dict\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Standard deviation over requested axis.\n\n See Also\n --------\n Series.std : Return standard deviation over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.sum : Return the sum of the values over the requested axis.\n\n Notes\n -----\n To have the same behaviour as ``numpy.std``, use ``ddof=0`` (instead of\n the default ``ddof=1``) and ``skipna=False``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n The standard deviation of the columns can be found as follows:\n\n >>> df.std()\n age 18.786076\n height 0.237417\n dtype: float64\n\n Alternatively, `ddof=0` can be set to normalize by N instead of N-1:\n\n >>> df.std(ddof=0)\n age 16.269219\n height 0.205609\n dtype: float64\n \"\"\"\n result = super().std(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"std\")\n return result\n\n # error: Signature of \"skew\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def skew(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def skew(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def skew(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"skew\")\n def skew(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased skew over requested axis.\n\n Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased skew over requested axis.\n\n See Also\n --------\n DataFrame.kurt : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.skew()\n 0.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [2, 3, 4], \"c\": [1, 3, 5]},\n ... index=[\"tiger\", \"zebra\", \"cow\"],\n ... )\n >>> df\n a b c\n tiger 1 2 1\n zebra 2 3 3\n cow 3 4 5\n >>> df.skew()\n a 0.0\n b 0.0\n c 0.0\n dtype: float64\n\n Using axis=1\n\n >>> df.skew(axis=1)\n tiger 1.732051\n zebra -1.732051\n cow 0.000000\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [\"T\", \"Z\", \"X\"]}, index=[\"tiger\", \"zebra\", \"cow\"]\n ... )\n >>> df.skew(numeric_only=True)\n a 0.0\n dtype: float64\n \"\"\"\n result = super().skew(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"skew\")\n return result\n\n # error: Signature of \"kurt\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def kurt(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"kurt\")\n def kurt(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased kurtosis over requested axis.\n\n Kurtosis obtained using Fisher's definition of\n kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased kurtosis over requested axis.\n\n See Also\n --------\n DataFrame.kurtosis : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 2, 3], index=[\"cat\", \"dog\", \"dog\", \"mouse\"])\n >>> s\n cat 1\n dog 2\n dog 2\n mouse 3\n dtype: int64\n >>> round(s.kurt(), 6)\n 1.5\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 2, 3], \"b\": [3, 4, 4, 4]},\n ... index=[\"cat\", \"dog\", \"dog\", \"mouse\"],\n ... )\n >>> df\n a b\n cat 1 3\n dog 2 4\n dog 2 4\n mouse 3 4\n >>> round(df.kurt(), 6)\n a 1.5\n b 4.0\n dtype: float64\n\n With axis=None\n\n >>> round(df.kurt(axis=None), 6)\n -0.988693\n\n Using axis=1\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2], \"b\": [3, 4], \"c\": [3, 4], \"d\": [1, 2]},\n ... index=[\"cat\", \"dog\"],\n ... )\n >>> df.kurt(axis=1)\n cat -6.0\n dog -6.0\n dtype: float64\n \"\"\"\n result = super().kurt(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"kurt\")\n return result\n\n # error: Incompatible types in assignment\n kurtosis = kurt # type: ignore[assignment]\n product = prod\n\n def cummin(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative minimum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n minimum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative minimum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.min : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.min : Return the minimum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummin()\n 0 2.0\n 1 NaN\n 2 2.0\n 3 -1.0\n 4 -1.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummin(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the minimum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummin()\n A B\n 0 2.0 1.0\n 1 2.0 NaN\n 2 1.0 0.0\n\n To iterate over columns and find the minimum in each row,\n use ``axis=1``\n\n >>> df.cummin(axis=1)\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummin(data, axis, skipna, *args, **kwargs)\n\n def cummax(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative maximum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n maximum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative maximum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.max : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.max : Return the maximum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummax()\n 0 2.0\n 1 NaN\n 2 5.0\n 3 5.0\n 4 5.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummax(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the maximum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummax()\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 3.0 1.0\n\n To iterate over columns and find the maximum in each row,\n use ``axis=1``\n\n >>> df.cummax(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummax(data, axis, skipna, *args, **kwargs)\n\n def cumsum(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative sum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n sum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative sum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.sum : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.sum : Return the sum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumsum()\n 0 2.0\n 1 NaN\n 2 7.0\n 3 6.0\n 4 6.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumsum(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the sum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumsum()\n A B\n 0 2.0 1.0\n 1 5.0 NaN\n 2 6.0 1.0\n\n To iterate over columns and find the sum in each row,\n use ``axis=1``\n\n >>> df.cumsum(axis=1)\n A B\n 0 2.0 3.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumsum(data, axis, skipna, *args, **kwargs)\n\n def cumprod(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative product over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n product.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative product of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.prod : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.prod : Return the product over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumprod()\n 0 2.0\n 1 NaN\n 2 10.0\n 3 -10.0\n 4 -0.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumprod(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the product\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumprod()\n A B\n 0 2.0 1.0\n 1 6.0 NaN\n 2 6.0 0.0\n\n To iterate over columns and find the product in each row,\n use ``axis=1``\n\n >>> df.cumprod(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumprod(data, axis, skipna, *args, **kwargs)\n\n def nunique(self, axis: Axis = 0, dropna: bool = True) -> Series:\n \"\"\"\n Count number of distinct elements in specified axis.\n\n Return Series with number of distinct elements. Can ignore NaN\n values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for\n column-wise.\n dropna : bool, default True\n Don't include NaN in the counts.\n\n Returns\n -------\n Series\n Series with counts of unique values per row or column, depending on `axis`.\n\n See Also\n --------\n Series.nunique: Method nunique for Series.\n DataFrame.count: Count non-NA cells for each column or row.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [4, 5, 6], \"B\": [4, 1, 1]})\n >>> df.nunique()\n A 3\n B 2\n dtype: int64\n\n >>> df.nunique(axis=1)\n 0 1\n 1 2\n 2 2\n dtype: int64\n \"\"\"\n return self.apply(Series.nunique, axis=axis, dropna=dropna)\n\n def idxmin(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of minimum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of minima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmin : Return index of the minimum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmin``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the minimum value in each column.\n\n >>> df.idxmin()\n consumption Pork\n co2_emissions Wheat Products\n dtype: str\n\n To return the index for the minimum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmin(axis=\"columns\")\n Pork consumption\n Wheat Products co2_emissions\n Beef consumption\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmin, \"argmin\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be np.ndarray since axis is not N\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmin\")\n\n def idxmax(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of maximum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of maxima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmax : Return index of the maximum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmax``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the maximum value in each column.\n\n >>> df.idxmax()\n consumption Wheat Products\n co2_emissions Beef\n dtype: str\n\n To return the index for the maximum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmax(axis=\"columns\")\n Pork co2_emissions\n Wheat Products consumption\n Beef co2_emissions\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmax, \"argmax\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be 1d array since axis is not None\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmax\")\n\n def _get_agg_axis(self, axis_num: int) -> Index:\n \"\"\"\n Let's be explicit about this.\n \"\"\"\n if axis_num == 0:\n return self.columns\n elif axis_num == 1:\n return self.index\n else:\n raise ValueError(f\"Axis must be 0 or 1 (got {axis_num!r})\")\n\n def mode(\n self, axis: Axis = 0, numeric_only: bool = False, dropna: bool = True\n ) -> DataFrame:\n \"\"\"\n Get the mode(s) of each element along the selected axis.\n\n The mode of a set of values is the value that appears most often.\n It can be multiple values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to iterate over while searching for the mode:\n\n * 0 or 'index' : get mode of each column\n * 1 or 'columns' : get mode of each row.\n\n numeric_only : bool, default False\n If True, only apply to numeric columns.\n dropna : bool, default True\n Don't consider counts of NaN/NaT.\n\n Returns\n -------\n DataFrame\n The modes of each column or row.\n\n See Also\n --------\n Series.mode : Return the highest frequency value in a Series.\n Series.value_counts : Return the counts of values in a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"bird\", 2, 2),\n ... (\"mammal\", 4, np.nan),\n ... (\"arthropod\", 8, 0),\n ... (\"bird\", 2, np.nan),\n ... ],\n ... index=(\"falcon\", \"horse\", \"spider\", \"ostrich\"),\n ... columns=(\"species\", \"legs\", \"wings\"),\n ... )\n >>> df\n species legs wings\n falcon bird 2 2.0\n horse mammal 4 NaN\n spider arthropod 8 0.0\n ostrich bird 2 NaN\n\n By default, missing values are not considered, and the mode of wings\n are both 0 and 2. Because the resulting DataFrame has two rows,\n the second row of ``species`` and ``legs`` contains ``NaN``.\n\n >>> df.mode()\n species legs wings\n 0 bird 2.0 0.0\n 1 NaN NaN 2.0\n\n Setting ``dropna=False`` ``NaN`` values are considered and they can be\n the mode (like for wings).\n\n >>> df.mode(dropna=False)\n species legs wings\n 0 bird 2 NaN\n\n Setting ``numeric_only=True``, only the mode of numeric columns is\n computed, and columns of other types are ignored.\n\n >>> df.mode(numeric_only=True)\n legs wings\n 0 2.0 0.0\n 1 NaN 2.0\n\n To compute the mode over columns and not rows, use the axis parameter:\n\n >>> df.mode(axis=\"columns\", numeric_only=True)\n 0 1\n falcon 2.0 NaN\n horse 4.0 NaN\n spider 0.0 8.0\n ostrich 2.0 NaN\n \"\"\"\n data = self if not numeric_only else self._get_numeric_data()\n\n def f(s):\n return s.mode(dropna=dropna)\n\n data = data.apply(f, axis=axis)\n # Ensure index is type stable (should always use int index)\n if data.empty:\n data.index = default_index(0)\n\n return data\n\n @overload\n def quantile(\n self,\n q: float = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series: ...\n\n @overload\n def quantile(\n self,\n q: AnyArrayLike | Sequence[float],\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n @overload\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = 0.5,\n axis: Axis = 0,\n numeric_only: bool = False,\n interpolation: QuantileInterpolation = \"linear\",\n method: Literal[\"single\", \"table\"] = \"single\",\n ) -> Series | DataFrame:\n \"\"\"\n Return values at the given quantile over requested axis.\n\n This method computes the value below which a given proportion of\n observations fall. By default, it computes quantiles column-wise,\n but row-wise computation is also supported via ``axis=1``.\n\n Parameters\n ----------\n q : float or array-like, default 0.5 (50% quantile)\n Value between 0 <= q <= 1, the quantile(s) to compute.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Equals 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}\n This optional parameter specifies the interpolation method to use,\n when the desired quantile lies between two data points `i` and `j`:\n\n * linear: `i + (j - i) * fraction`, where `fraction` is the\n fractional part of the index surrounded by `i` and `j`.\n * lower: `i`.\n * higher: `j`.\n * nearest: `i` or `j` whichever is nearest.\n * midpoint: (`i` + `j`) / 2.\n method : {'single', 'table'}, default 'single'\n Whether to compute quantiles per-column ('single') or over all columns\n ('table'). When 'table', the only allowed interpolation methods are\n 'nearest', 'lower', and 'higher'.\n\n Returns\n -------\n Series or DataFrame\n\n If ``q`` is an array, a DataFrame will be returned where the\n index is ``q``, the columns are the columns of self, and the\n values are the quantiles.\n If ``q`` is a float, a Series will be returned where the\n index is the columns of self and the values are the quantiles.\n\n See Also\n --------\n core.window.rolling.Rolling.quantile: Rolling quantile.\n numpy.percentile: Numpy function to compute the percentile.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=[\"a\", \"b\"]\n ... )\n >>> df.quantile(0.1)\n a 1.3\n b 3.7\n Name: 0.1, dtype: float64\n >>> df.quantile([0.1, 0.5])\n a b\n 0.1 1.3 3.7\n 0.5 2.5 55.0\n\n Specifying `method='table'` will compute the quantile over all columns.\n\n >>> df.quantile(0.1, method=\"table\", interpolation=\"nearest\")\n a 1\n b 1\n Name: 0.1, dtype: int64\n >>> df.quantile([0.1, 0.5], method=\"table\", interpolation=\"nearest\")\n a b\n 0.1 1 1\n 0.5 3 100\n\n Specifying `numeric_only=False` will compute the quantiles for all\n columns.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [1, 2],\n ... \"B\": [pd.Timestamp(\"2010\"), pd.Timestamp(\"2011\")],\n ... \"C\": [pd.Timedelta(\"1 days\"), pd.Timedelta(\"2 days\")],\n ... }\n ... )\n >>> df.quantile(0.5, numeric_only=False)\n A 1.5\n B 2010-07-02 12:00:00\n C 1 days 12:00:00\n Name: 0.5, dtype: object\n \"\"\"\n validate_percentile(q)\n axis = self._get_axis_number(axis)\n\n if not is_list_like(q):\n # BlockManager.quantile expects listlike, so we wrap and unwrap here\n # error: List item 0 has incompatible type \"float | ExtensionArray |\n # ndarray[Any, Any] | Index | Series | Sequence[float]\"; expected \"float\"\n res_df = self.quantile(\n [q], # type: ignore[list-item]\n axis=axis,\n numeric_only=numeric_only,\n interpolation=interpolation,\n method=method,\n )\n if method == \"single\":\n res = res_df.iloc[0]\n else:\n # cannot directly iloc over sparse arrays\n res = res_df.T.iloc[:, 0]\n if axis == 1 and len(self) == 0:\n # GH#41544 try to get an appropriate dtype\n dtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(dtype):\n return res.astype(dtype)\n return res\n\n q = Index(q, dtype=np.float64)\n data = self._get_numeric_data() if numeric_only else self\n\n if axis == 1:\n data = data.T\n\n if len(data.columns) == 0:\n # GH#23925 _get_numeric_data may have dropped all columns\n cols = self.columns[:0]\n\n dtype = np.float64\n if axis == 1:\n # GH#41544 try to get an appropriate dtype\n cdtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(cdtype):\n dtype = cdtype\n\n res = self._constructor([], index=q, columns=cols, dtype=dtype)\n return res.__finalize__(self, method=\"quantile\")\n\n valid_method = {\"single\", \"table\"}\n if method not in valid_method:\n raise ValueError(\n f\"Invalid method: {method}. Method must be in {valid_method}.\"\n )\n if method == \"single\":\n res = data._mgr.quantile(qs=q, interpolation=interpolation)\n elif method == \"table\":\n valid_interpolation = {\"nearest\", \"lower\", \"higher\"}\n if interpolation not in valid_interpolation:\n raise ValueError(\n f\"Invalid interpolation: {interpolation}. \"\n f\"Interpolation must be in {valid_interpolation}\"\n )\n # handle degenerate case\n if len(data) == 0:\n if data.ndim == 2:\n dtype = find_common_type(list(self.dtypes))\n else:\n dtype = self.dtype\n return self._constructor([], index=q, columns=data.columns, dtype=dtype)\n\n q_idx = np.quantile(np.arange(len(data)), q, method=interpolation)\n\n by = data.columns\n if len(by) > 1:\n keys = [data._get_label_or_level_values(x) for x in by]\n indexer = lexsort_indexer(keys)\n else:\n k = data._get_label_or_level_values(by[0])\n indexer = nargsort(k)\n\n res = data._mgr.take(indexer[q_idx], verify=False)\n res.axes[1] = q\n\n result = self._constructor_from_mgr(res, axes=res.axes)\n return result.__finalize__(self, method=\"quantile\")\n\n def to_timestamp(\n self,\n freq: Frequency | None = None,\n how: ToTimestampHow = \"start\",\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Cast PeriodIndex to DatetimeIndex of timestamps, at *beginning* of period.\n\n This can be changed to the *end* of the period, by specifying `how=\"e\"`.\n\n Parameters\n ----------\n freq : str, default frequency of PeriodIndex\n Desired frequency.\n how : {'s', 'e', 'start', 'end'}\n Convention for converting period to timestamp; start of period\n vs. end.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame with DatetimeIndex\n DataFrame with the PeriodIndex cast to DatetimeIndex.\n\n See Also\n --------\n DataFrame.to_period: Inverse method to cast DatetimeIndex to PeriodIndex.\n Series.to_timestamp: Equivalent method for Series.\n\n Examples\n --------\n >>> idx = pd.PeriodIndex([\"2023\", \"2024\"], freq=\"Y\")\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d, index=idx)\n >>> df1\n col1 col2\n 2023 1 3\n 2024\t 2 4\n\n The resulting timestamps will be at the beginning of the year in this case\n\n >>> df1 = df1.to_timestamp()\n >>> df1\n col1 col2\n 2023-01-01 1 3\n 2024-01-01 2 4\n >>> df1.index\n DatetimeIndex(['2023-01-01', '2024-01-01'], dtype='datetime64[us]', freq=None)\n\n Using `freq` which is the offset that the Timestamps will have\n\n >>> df2 = pd.DataFrame(data=d, index=idx)\n >>> df2 = df2.to_timestamp(freq=\"M\")\n >>> df2\n col1 col2\n 2023-01-31 1 3\n 2024-01-31 2 4\n >>> df2.index\n DatetimeIndex(['2023-01-31', '2024-01-31'], dtype='datetime64[us]', freq=None)\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, PeriodIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_timestamp(freq=freq, how=how)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def to_period(\n self,\n freq: Frequency | None = None,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Convert DataFrame from DatetimeIndex to PeriodIndex.\n\n Convert DataFrame from DatetimeIndex to PeriodIndex with desired\n frequency (inferred from index if not passed). Either index of columns can be\n converted, depending on `axis` argument.\n\n Parameters\n ----------\n freq : str, default\n Frequency of the PeriodIndex.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The DataFrame with the converted PeriodIndex.\n\n See Also\n --------\n Series.to_period: Equivalent method for Series.\n Series.dt.to_period: Convert DateTime column values.\n\n Examples\n --------\n >>> idx = pd.to_datetime(\n ... [\n ... \"2001-03-31 00:00:00\",\n ... \"2002-05-31 00:00:00\",\n ... \"2003-08-31 00:00:00\",\n ... ]\n ... )\n\n >>> idx\n DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],\n dtype='datetime64[us]', freq=None)\n\n >>> idx.to_period(\"M\")\n PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')\n\n For the yearly frequency\n\n >>> idx.to_period(\"Y\")\n PeriodIndex(['2001', '2002', '2003'], dtype='period[Y-DEC]')\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, DatetimeIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_period(freq=freq)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def isin(self, values: Series | DataFrame | Sequence | Mapping) -> DataFrame:\n \"\"\"\n Whether each element in the DataFrame is contained in values.\n\n Returns a DataFrame of the same shape with boolean values: True\n where the element is in the corresponding structure of\n ``values``, False otherwise. ``values`` can be a list, dict,\n Series, or DataFrame; alignment rules depend on its type.\n\n Parameters\n ----------\n values : iterable, Series, DataFrame or dict\n The result will only be true at a location if all the\n labels match. If `values` is a Series, that's the index. If\n `values` is a dict, the keys must be the column names,\n which must match. If `values` is a DataFrame,\n then both the index and column labels must match.\n\n Returns\n -------\n DataFrame\n DataFrame of booleans showing whether each element in the DataFrame\n is contained in values.\n\n See Also\n --------\n DataFrame.eq: Equality test for DataFrame.\n Series.isin: Equivalent method on Series.\n Series.str.contains: Test if pattern or regex is contained within a\n string of a Series or Index.\n\n Notes\n -----\n ``__iter__`` is used (and not ``__contains__``) to iterate over values\n when checking if it contains the elements in DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4], \"num_wings\": [2, 0]}, index=[\"falcon\", \"dog\"]\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n\n When ``values`` is a list check whether every value in the DataFrame\n is present in the list (which animals have 0 or 2 legs or wings)\n\n >>> df.isin([0, 2])\n num_legs num_wings\n falcon True True\n dog False True\n\n To check if ``values`` is *not* in the DataFrame, use the ``~`` operator:\n\n >>> ~df.isin([0, 2])\n num_legs num_wings\n falcon False False\n dog True False\n\n When ``values`` is a dict, we can pass values to check for each\n column separately:\n\n >>> df.isin({\"num_wings\": [0, 3]})\n num_legs num_wings\n falcon False False\n dog False True\n\n When ``values`` is a Series or DataFrame the index and column must\n match. Note that 'falcon' does not match based on the number of legs\n in other.\n\n >>> other = pd.DataFrame(\n ... {\"num_legs\": [8, 3], \"num_wings\": [0, 2]}, index=[\"spider\", \"falcon\"]\n ... )\n >>> df.isin(other)\n num_legs num_wings\n falcon False True\n dog False False\n \"\"\"\n if isinstance(values, dict):\n from pandas.core.reshape.concat import concat\n\n values = collections.defaultdict(list, values)\n result = concat(\n (\n self.iloc[:, [i]].isin(values[col])\n for i, col in enumerate(self.columns)\n ),\n axis=1,\n )\n elif isinstance(values, Series):\n if not values.index.is_unique:\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self), axis=\"index\")\n elif isinstance(values, DataFrame):\n if not (values.columns.is_unique and values.index.is_unique):\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self))\n else:\n if not is_list_like(values):\n raise TypeError(\n \"only list-like or dict-like objects are allowed \"\n \"to be passed to DataFrame.isin(), \"\n f\"you passed a '{type(values).__name__}'\"\n )\n\n def isin_(x):\n # error: Argument 2 to \"isin\" has incompatible type \"Union[Series,\n # DataFrame, Sequence[Any], Mapping[Any, Any]]\"; expected\n # \"Union[Union[Union[ExtensionArray, ndarray[Any, Any]], Index,\n # Series], List[Any], range]\"\n result = algorithms.isin(\n x.ravel(),\n values, # type: ignore[arg-type]\n )\n return result.reshape(x.shape)\n\n res_mgr = self._mgr.apply(isin_)\n result = self._constructor_from_mgr(\n res_mgr,\n axes=res_mgr.axes,\n )\n return result.__finalize__(self, method=\"isin\")\n\n # ----------------------------------------------------------------------\n # Add index and columns\n _AXIS_ORDERS: list[Literal[\"index\", \"columns\"]] = [\"index\", \"columns\"]\n _AXIS_TO_AXIS_NUMBER: dict[Axis, int] = {\n **NDFrame._AXIS_TO_AXIS_NUMBER,\n 1: 1,\n \"columns\": 1,\n }\n _AXIS_LEN = len(_AXIS_ORDERS)\n _info_axis_number: Literal[1] = 1\n _info_axis_name: Literal[\"columns\"] = \"columns\"\n\n index = properties.AxisProperty(\n axis=1,\n doc=\"\"\"\n The index (row labels) of the DataFrame.\n\n The index of a DataFrame is a series of labels that identify each row.\n The labels can be integers, strings, or any other hashable type. The index\n is used for label-based access and alignment, and can be accessed or\n modified using this attribute.\n\n Returns\n -------\n pandas.Index\n The index labels of the DataFrame.\n\n See Also\n --------\n DataFrame.columns : The column labels of the DataFrame.\n DataFrame.to_numpy : Convert the DataFrame to a NumPy array.\n\n Examples\n --------\n >>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],\n ... 'Age': [25, 30, 35],\n ... 'Location': ['Seattle', 'New York', 'Kona']},\n ... index=([10, 20, 30]))\n >>> df.index\n Index([10, 20, 30], dtype='int64')\n\n In this example, we create a DataFrame with 3 rows and 3 columns,\n including Name, Age, and Location information. We set the index labels to\n be the integers 10, 20, and 30. We then access the `index` attribute of the\n DataFrame, which returns an `Index` object containing the index labels.\n\n >>> df.index = [100, 200, 300]\n >>> df\n Name Age Location\n 100 Alice 25 Seattle\n 200 Bob 30 New York\n 300 Aritra 35 Kona\n\n In this example, we modify the index labels of the DataFrame by assigning\n a new list of labels to the `index` attribute. The DataFrame is then\n updated with the new labels, and the output shows the modified DataFrame.\n \"\"\",\n )\n columns = properties.AxisProperty(\n axis=0,\n doc=\"\"\"\n The column labels of the DataFrame.\n\n This property holds the column names as a pandas ``Index`` object.\n It provides an immutable sequence of column labels that can be\n used for data selection, renaming, and alignment in DataFrame operations.\n\n Returns\n -------\n pandas.Index\n The column labels of the DataFrame.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.axes: Return a list representing the axes of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})\n >>> df\n A B\n 0 1 3\n 1 2 4\n >>> df.columns\n Index(['A', 'B'], dtype='str')\n \"\"\",\n )\n\n # ----------------------------------------------------------------------\n # Add plotting methods to DataFrame\n plot = Accessor(\"plot\", pandas.plotting.PlotAccessor)\n hist = pandas.plotting.hist_frame\n boxplot = pandas.plotting.boxplot_frame\n sparse = Accessor(\"sparse\", SparseFrameAccessor)\n\n # ----------------------------------------------------------------------\n # Internal Interface Methods\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\n @property\n def values(self) -> np.ndarray:\n \"\"\"\n Return a Numpy representation of the DataFrame.\n\n .. warning::\n\n We recommend using :meth:`DataFrame.to_numpy` instead.\n ``.values`` offers no way to control the output ``dtype``, copy\n semantics, or the value used to fill missing entries, while\n :meth:`DataFrame.to_numpy` exposes those as the ``dtype``,\n ``copy``, and ``na_value`` arguments. The mutability of the\n result also depends on the DataFrame's internal block layout:\n when the DataFrame is backed by a single block the result is a\n read-only view (writes raise); when there are multiple blocks\n the result is a writable copy whose mutations do not propagate\n back to the DataFrame.\n\n Only the values in the DataFrame will be returned, the axes labels\n will be removed.\n\n Returns\n -------\n numpy.ndarray\n The values of the DataFrame.\n\n See Also\n --------\n DataFrame.to_numpy : Recommended alternative to this method.\n DataFrame.index : Retrieve the index labels.\n DataFrame.columns : Retrieving the column names.\n\n Notes\n -----\n The returned array is not intended to be written to. When the\n DataFrame is backed by a single NumPy array (single dtype, single\n block), the result is a read-only view; when the DataFrame has\n multiple internal blocks (e.g. after adding a new column), the\n result is a copy and modifications to it will not be reflected in\n the original DataFrame. Use :meth:`DataFrame.to_numpy` for more\n explicit control over copy behavior, or use :attr:`DataFrame.iloc`\n to modify values in-place.\n\n The dtype will be a lower-common-denominator dtype (implicit\n upcasting); that is to say if the dtypes (even of numeric types)\n are mixed, the one that accommodates all will be chosen. Use this\n with care if you are not dealing with the blocks.\n\n e.g. If the dtypes are float16 and float32, dtype will be upcast to\n float32. If dtypes are int32 and uint8, dtype will be upcast to\n int32. By :func:`numpy.find_common_type` convention, mixing int64\n and uint64 will result in a float64 dtype.\n\n Examples\n --------\n A DataFrame where all columns are the same type (e.g., int64) results\n in an array of the same type.\n\n >>> df = pd.DataFrame(\n ... {\"age\": [3, 29], \"height\": [94, 170], \"weight\": [31, 115]}\n ... )\n >>> df\n age height weight\n 0 3 94 31\n 1 29 170 115\n >>> df.dtypes\n age int64\n height int64\n weight int64\n dtype: object\n >>> df.values\n array([[ 3, 94, 31],\n [ 29, 170, 115]])\n\n A DataFrame with mixed type columns(e.g., str/object, int64, float32)\n results in an ndarray of the broadest type that accommodates these\n mixed types (e.g., object).\n\n >>> df2 = pd.DataFrame(\n ... [\n ... (\"parrot\", 24.0, \"second\"),\n ... (\"lion\", 80.5, 1),\n ... (\"monkey\", np.nan, None),\n ... ],\n ... columns=(\"name\", \"max_speed\", \"rank\"),\n ... )\n >>> df2.dtypes\n name str\n max_speed float64\n rank object\n dtype: object\n >>> df2.values\n array([['parrot', 24.0, 'second'],\n ['lion', 80.5, 1],\n ['monkey', nan, None]], dtype=object)\n\n ``DataFrame.to_numpy`` produces the same array by default, but lets\n you choose how missing values are represented and request a\n guaranteed copy:\n\n >>> df3 = pd.DataFrame({\"a\": [1, 2], \"b\": [3.0, np.nan]})\n >>> df3.values\n array([[ 1., 3.],\n [ 2., nan]])\n >>> df3.to_numpy(na_value=-1)\n array([[ 1., 3.],\n [ 2., -1.]])\n >>> df3.to_numpy(dtype=\"float32\", copy=True)\n array([[ 1., 3.],\n [ 2., nan]], dtype=float32)\n \"\"\"\n return self._mgr.as_array()\n\n\ndef _from_nested_dict(\n data: Mapping[HashableT, Mapping[HashableT2, T]],\n) -> collections.defaultdict[HashableT2, dict[HashableT, T]]:\n new_data: collections.defaultdict[HashableT2, dict[HashableT, T]] = (\n collections.defaultdict(dict)\n )\n for index, s in data.items():\n for col, v in s.items():\n new_data[col][index] = v\n return new_data\n\n\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n# InferHarness context needle: needle-two-facts-256k beta\nBENCHMARK_NEEDLE_BETA = \"IH_NEEDLE_256K_BETA\"\n# End InferHarness context needle beta\ne\n```\n
","tags":["context-window","needle-retrieval","python","two-facts","256k"],"expected_answer":"IH_NEEDLE_256K_ALPHA|IH_NEEDLE_256K_BETA","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":256000,"needle_position":"two_facts_20_and_80_percent","needle_count":2,"evaluation_mode":"two_fact_exact_values"}} +{"id":"negative-control-256k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: negative-control-256k\nApproximate target context: 256000 tokens; needle position: absent.\nThe source may or may not contain a Python benchmark needle for negative-control-256k. If the needle is absent, reply exactly: NOT_FOUND.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, …, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n def pivot_table(\n self,\n values=None,\n index=None,\n columns=None,\n aggfunc: AggFuncType = \"mean\",\n fill_value=None,\n margins: bool = False,\n dropna: bool = True,\n margins_name: Level = \"All\",\n observed: bool = True,\n sort: bool = True,\n **kwargs,\n ) -> DataFrame:\n \"\"\"\n Create a spreadsheet-style pivot table as a DataFrame.\n\n The levels in the pivot table will be stored in MultiIndex objects\n (hierarchical indexes) on the index and columns of the result DataFrame.\n\n Parameters\n ----------\n values : list-like or scalar, optional\n Column or columns to aggregate.\n index : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n columns : column, Grouper, array, or sequence of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.\n aggfunc : function, list of functions, dict, default \"mean\"\n If a list of functions is passed, the resulting pivot table will have\n hierarchical columns whose top level are the function names\n (inferred from the function objects themselves).\n If a dict is passed, the key is column to aggregate and the value is\n function or list of functions. If ``margin=True``, aggfunc will be\n used to calculate the partial aggregates.\n fill_value : scalar, default None\n Value to replace missing values with (in the resulting pivot table,\n after aggregation).\n margins : bool, default False\n If ``margins=True``, special ``All`` columns and rows\n will be added with partial group aggregates across the categories\n on the rows and columns.\n dropna : bool, default True\n Do not include columns whose entries are all NaN. If True,\n\n * rows with an NA value in any column will be omitted before computing\n margins,\n * index/column keys containing NA values will be dropped (see ``dropna``\n parameter in :meth:`DataFrame.groupby`).\n\n margins_name : str, default 'All'\n Name of the row / column that will contain the totals\n when margins is True.\n observed : bool, default False\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n sort : bool, default True\n Specifies if the result should be sorted.\n\n **kwargs : dict\n Optional keyword arguments to pass to ``aggfunc``.\n\n Returns\n -------\n DataFrame\n An Excel style pivot table.\n\n See Also\n --------\n DataFrame.pivot : Pivot without aggregation that can handle\n non-numeric data.\n DataFrame.melt: Unpivot a DataFrame from wide to long format,\n optionally leaving identifiers set.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"foo\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... \"bar\",\n ... ],\n ... \"B\": [\n ... \"one\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... \"one\",\n ... \"one\",\n ... \"two\",\n ... \"two\",\n ... ],\n ... \"C\": [\n ... \"small\",\n ... \"large\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... \"small\",\n ... \"small\",\n ... \"large\",\n ... ],\n ... \"D\": [1, 2, 2, 3, 3, 4, 5, 6, 7],\n ... \"E\": [2, 4, 5, 5, 6, 6, 8, 9, 9],\n ... }\n ... )\n >>> df\n A B C D E\n 0 foo one small 1 2\n 1 foo one large 2 4\n 2 foo one large 2 5\n 3 foo two small 3 5\n 4 foo two small 3 6\n 5 bar one large 4 6\n 6 bar one small 5 8\n 7 bar two small 6 9\n 8 bar two large 7 9\n\n This first example aggregates values by taking the sum.\n\n >>> table = pd.pivot_table(\n ... df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"], aggfunc=\"sum\"\n ... )\n >>> table\n C large small\n A B\n bar one 4.0 5.0\n two 7.0 6.0\n foo one 4.0 1.0\n two NaN 6.0\n\n We can also fill missing values using the `fill_value` parameter.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=\"D\",\n ... index=[\"A\", \"B\"],\n ... columns=[\"C\"],\n ... aggfunc=\"sum\",\n ... fill_value=0,\n ... )\n >>> table\n C large small\n A B\n bar one 4 5\n two 7 6\n foo one 4 1\n two 0 6\n\n The next example aggregates by taking the mean across multiple columns.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": \"mean\"},\n ... )\n >>> table\n D E\n A C\n bar large 5.500000 7.500000\n small 5.500000 8.500000\n foo large 2.000000 4.500000\n small 2.333333 4.333333\n\n We can also calculate multiple types of aggregations for any given\n value column.\n\n >>> table = pd.pivot_table(\n ... df,\n ... values=[\"D\", \"E\"],\n ... index=[\"A\", \"C\"],\n ... aggfunc={\"D\": \"mean\", \"E\": [\"min\", \"max\", \"mean\"]},\n ... )\n >>> table\n D E\n mean max mean min\n A C\n bar large 5.500000 9 7.500000 6\n small 5.500000 9 8.500000 8\n foo large 2.000000 5 4.500000 4\n small 2.333333 6 4.333333 2\n \"\"\"\n from pandas.core.reshape.pivot import pivot_table\n\n return pivot_table(\n self,\n values=values,\n index=index,\n columns=columns,\n aggfunc=aggfunc,\n fill_value=fill_value,\n margins=margins,\n dropna=dropna,\n margins_name=margins_name,\n observed=observed,\n sort=sort,\n **kwargs,\n )\n\n def stack(\n self,\n level: IndexLabel = -1,\n dropna: bool | lib.NoDefault = lib.no_default,\n sort: bool | lib.NoDefault = lib.no_default,\n future_stack: bool = True,\n ):\n \"\"\"\n Stack the prescribed level(s) from columns to index.\n\n Return a reshaped DataFrame or Series having a multi-level\n index with one or more new inner-most levels compared to the current\n DataFrame. The new inner-most levels are created by pivoting the\n columns of the current dataframe:\n\n - if the columns have a single level, the output is a Series;\n - if the columns have multiple levels, the new index level(s) is (are)\n taken from the prescribed level(s) and the output is a DataFrame.\n\n Parameters\n ----------\n level : int, str, list, default -1\n Level(s) to stack from the column axis onto the index\n axis, defined as one index or label, or a list of indices\n or labels.\n dropna : bool, default True\n Whether to drop rows in the resulting Frame/Series with\n missing values. Stacking a column level onto the index\n axis can create combinations of index and column values\n that are missing from the original dataframe. See Examples\n section.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n sort : bool, default True\n Whether to sort the levels of the resulting MultiIndex.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n future_stack : bool, default True\n Whether to use the new stack implementation. This is the default\n as of pandas 3.0. When True, dropna and sort have no impact\n on the result and must remain unspecified. See :ref:`pandas 2.1.0 Release\n notes ` for more details.\n\n .. deprecated:: 3.0\n This parameter is deprecated and will be removed in a future\n version of pandas.\n\n Returns\n -------\n DataFrame or Series\n Stacked dataframe or series.\n\n See Also\n --------\n DataFrame.unstack : Unstack prescribed level(s) from index axis\n onto column axis.\n DataFrame.pivot : Reshape dataframe from long format to wide\n format.\n DataFrame.pivot_table : Create a spreadsheet-style pivot table\n as a DataFrame.\n\n Notes\n -----\n The function is named by analogy with a collection of books being\n reorganized from being side-by-side horizontally (the columns of the\n DataFrame) to being stacked vertically on top of each other (in the\n index of the DataFrame).\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n **Single level columns**\n\n >>> df_single_level_cols = pd.DataFrame(\n ... [[0, 1], [2, 3]], index=[\"cat\", \"dog\"], columns=[\"weight\", \"height\"]\n ... )\n\n Stacking a dataframe with a single level column axis returns a Series:\n\n >>> df_single_level_cols\n weight height\n cat 0 1\n dog 2 3\n >>> df_single_level_cols.stack()\n cat weight 0\n height 1\n dog weight 2\n height 3\n dtype: int64\n\n **Multi level columns: simple case**\n\n >>> multicol1 = pd.MultiIndex.from_tuples(\n ... [(\"weight\", \"kg\"), (\"weight\", \"pounds\")]\n ... )\n >>> df_multi_level_cols1 = pd.DataFrame(\n ... [[1, 2], [2, 4]], index=[\"cat\", \"dog\"], columns=multicol1\n ... )\n\n Stacking a dataframe with a multi-level column axis:\n\n >>> df_multi_level_cols1\n weight\n kg pounds\n cat 1 2\n dog 2 4\n >>> df_multi_level_cols1.stack()\n weight\n cat kg 1\n pounds 2\n dog kg 2\n pounds 4\n\n **Missing values**\n\n >>> multicol2 = pd.MultiIndex.from_tuples([(\"weight\", \"kg\"), (\"height\", \"m\")])\n >>> df_multi_level_cols2 = pd.DataFrame(\n ... [[1.0, 2.0], [3.0, 4.0]], index=[\"cat\", \"dog\"], columns=multicol2\n ... )\n\n It is common to have missing values when stacking a dataframe\n with multi-level columns, as the stacked dataframe typically\n has more values than the original dataframe. Missing values\n are filled with NaNs:\n\n >>> df_multi_level_cols2\n weight height\n kg m\n cat 1.0 2.0\n dog 3.0 4.0\n >>> df_multi_level_cols2.stack()\n weight height\n cat kg 1.0 NaN\n m NaN 2.0\n dog kg 3.0 NaN\n m NaN 4.0\n\n **Prescribing the level(s) to be stacked**\n\n The first parameter controls which level or levels are stacked:\n\n >>> df_multi_level_cols2.stack(0)\n kg m\n cat weight 1.0 NaN\n height NaN 2.0\n dog weight 3.0 NaN\n height NaN 4.0\n >>> df_multi_level_cols2.stack([0, 1])\n cat weight kg 1.0\n height m 2.0\n dog weight kg 3.0\n height m 4.0\n dtype: float64\n \"\"\"\n if not future_stack:\n from pandas.core.reshape.reshape import (\n stack,\n stack_multiple,\n )\n\n warnings.warn(\n \"The previous implementation of stack is deprecated and will be \"\n \"removed in a future version of pandas. See the What's New notes \"\n \"for pandas 2.1.0 for details. Do not specify the future_stack \"\n \"argument to adopt the new implementation and silence this warning.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n if dropna is lib.no_default:\n dropna = True\n if sort is lib.no_default:\n sort = True\n\n if isinstance(level, (tuple, list)):\n result = stack_multiple(self, level, dropna=dropna, sort=sort)\n else:\n result = stack(self, level, dropna=dropna, sort=sort)\n else:\n from pandas.core.reshape.reshape import stack_v3\n\n if dropna is not lib.no_default:\n raise ValueError(\n \"dropna must be unspecified as the new \"\n \"implementation does not introduce rows of NA values. This \"\n \"argument will be removed in a future version of pandas.\"\n )\n\n if sort is not lib.no_default:\n raise ValueError(\n \"Cannot specify sort, this argument will be \"\n \"removed in a future version of pandas. Sort the result using \"\n \".sort_index instead.\"\n )\n\n if (\n isinstance(level, (tuple, list))\n and not all(lev in self.columns.names for lev in level)\n and not all(isinstance(lev, int) for lev in level)\n ):\n raise ValueError(\n \"level should contain all level names or all level \"\n \"numbers, not a mixture of the two.\"\n )\n\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.columns._get_level_number(lev) for lev in level]\n result = stack_v3(self, level)\n\n return result.__finalize__(self, method=\"stack\")\n\n def explode(\n self,\n column: IndexLabel,\n ignore_index: bool = False,\n ) -> DataFrame:\n \"\"\"\n Transform each element of a list-like to a row, replicating index values.\n\n This method is useful for expanding nested data structures like lists\n into separate rows while maintaining the relationship with other columns.\n\n Parameters\n ----------\n column : IndexLabel\n Column(s) to explode.\n For multiple columns, specify a non-empty list with each element\n be str or tuple, and all specified columns their list-like data\n on same row of the frame must have matching length.\n\n ignore_index : bool, default False\n If True, the resulting index will be labeled 0, 1, …, n - 1.\n\n Returns\n -------\n DataFrame\n Exploded lists to rows of the subset columns;\n index will be duplicated for these rows.\n\n Raises\n ------\n ValueError :\n * If columns of the frame are not unique.\n * If specified columns to explode is empty list.\n * If specified columns to explode have not matching count of\n elements rowwise in the frame.\n\n See Also\n --------\n DataFrame.unstack : Pivot a level of the (necessarily hierarchical)\n index labels.\n DataFrame.melt : Unpivot a DataFrame from wide format to long format.\n Series.explode : Explode a DataFrame from list-like columns to long format.\n\n Notes\n -----\n This routine will explode list-likes including lists, tuples, sets,\n Series, and np.ndarray. The result dtype of the subset rows will\n be object. Scalars will be returned unchanged, and empty list-likes will\n result in a np.nan for that row. In addition, the ordering of rows in the\n output will be non-deterministic when exploding sets.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [[0, 1, 2], \"foo\", [], [3, 4]],\n ... \"B\": 1,\n ... \"C\": [[\"a\", \"b\", \"c\"], np.nan, [], [\"d\", \"e\"]],\n ... }\n ... )\n >>> df\n A B C\n 0 [0, 1, 2] 1 [a, b, c]\n 1 foo 1 NaN\n 2 [] 1 []\n 3 [3, 4] 1 [d, e]\n\n Single-column explode.\n\n >>> df.explode(\"A\")\n A B C\n 0 0 1 [a, b, c]\n 0 1 1 [a, b, c]\n 0 2 1 [a, b, c]\n 1 foo 1 NaN\n 2 NaN 1 []\n 3 3 1 [d, e]\n 3 4 1 [d, e]\n\n Multi-column explode.\n\n >>> df.explode(list(\"AC\"))\n A B C\n 0 0 1 a\n 0 1 1 b\n 0 2 1 c\n 1 foo 1 NaN\n 2 NaN 1 NaN\n 3 3 1 d\n 3 4 1 e\n \"\"\"\n if not self.columns.is_unique:\n duplicate_cols = self.columns[self.columns.duplicated()].tolist()\n raise ValueError(\n f\"DataFrame columns must be unique. Duplicate columns: {duplicate_cols}\"\n )\n\n columns: list[Hashable]\n if is_scalar(column) or isinstance(column, tuple):\n columns = [column]\n elif isinstance(column, list) and all(\n is_scalar(c) or isinstance(c, tuple) for c in column\n ):\n if not column:\n raise ValueError(\"column must be nonempty\")\n if len(column) > len(set(column)):\n raise ValueError(\"column must be unique\")\n columns = column\n else:\n raise ValueError(\"column must be a scalar, tuple, or list thereof\")\n\n df = self.reset_index(drop=True)\n if len(columns) == 1:\n result = df[columns[0]].explode()\n else:\n mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1\n counts0 = self[columns[0]].apply(mylen)\n for c in columns[1:]:\n if not all(counts0 == self[c].apply(mylen)):\n raise ValueError(\"columns must have matching element counts\")\n result = DataFrame({c: df[c].explode() for c in columns})\n result = df.drop(columns, axis=1).join(result)\n if ignore_index:\n result.index = default_index(len(result))\n else:\n result.index = self.index.take(result.index) # type: ignore[arg-type]\n result = result.reindex(columns=self.columns)\n\n return result.__finalize__(self, method=\"explode\")\n\n def unstack(\n self, level: IndexLabel = -1, fill_value=None, sort: bool = True\n ) -> DataFrame | Series:\n \"\"\"\n Pivot a level of the (necessarily hierarchical) index labels.\n\n Returns a DataFrame having a new level of column labels whose inner-most level\n consists of the pivoted index labels.\n\n If the index is not a MultiIndex, the output will be a Series\n (the analogue of stack when the columns are not a MultiIndex).\n\n Parameters\n ----------\n level : int, str, or list of these, default -1 (last level)\n Level(s) of index to unstack, can pass level name.\n fill_value : scalar\n Replace NaN with this value if the unstack produces missing values.\n sort : bool, default True\n Sort the level(s) in the resulting MultiIndex columns.\n\n Returns\n -------\n Series or DataFrame\n If index is a MultiIndex: DataFrame with pivoted index labels as new\n inner-most level column labels, else Series.\n\n See Also\n --------\n DataFrame.pivot : Pivot a table based on column values.\n DataFrame.stack : Pivot a level of the column labels (inverse operation\n from `unstack`).\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> index = pd.MultiIndex.from_tuples(\n ... [(\"one\", \"a\"), (\"one\", \"b\"), (\"two\", \"a\"), (\"two\", \"b\")]\n ... )\n >>> s = pd.Series(np.arange(1.0, 5.0), index=index)\n >>> s\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n\n >>> s.unstack(level=-1)\n a b\n one 1.0 2.0\n two 3.0 4.0\n\n >>> s.unstack(level=0)\n one two\n a 1.0 3.0\n b 2.0 4.0\n\n >>> df = s.unstack(level=0)\n >>> df.unstack()\n one a 1.0\n b 2.0\n two a 3.0\n b 4.0\n dtype: float64\n \"\"\"\n from pandas.core.reshape.reshape import unstack\n\n result = unstack(self, level, fill_value, sort)\n\n return result.__finalize__(self, method=\"unstack\")\n\n def melt(\n self,\n id_vars=None,\n value_vars=None,\n var_name=None,\n value_name: Hashable = \"value\",\n col_level: Level | None = None,\n ignore_index: bool = True,\n ) -> DataFrame:\n \"\"\"\n Unpivot DataFrame from wide to long format, optionally leaving identifiers set.\n\n This function is useful to massage a DataFrame into a format where one\n or more columns are identifier variables (`id_vars`), while all other\n columns, considered measured variables (`value_vars`), are \"unpivoted\" to\n the row axis, leaving just two non-identifier columns, 'variable' and\n 'value'.\n\n Parameters\n ----------\n id_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to use as identifier variables.\n value_vars : scalar, tuple, list, or ndarray, optional\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.\n var_name : scalar, default None\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.\n value_name : scalar, default 'value'\n Name to use for the 'value' column, can't be an existing column label.\n col_level : scalar, optional\n If columns are a MultiIndex then use this level to melt.\n ignore_index : bool, default True\n If True, original index is ignored. If False, original index is retained.\n Index labels will be repeated as necessary.\n\n Returns\n -------\n DataFrame\n Unpivoted DataFrame.\n\n See Also\n --------\n melt : Identical method.\n pivot_table : Create a spreadsheet-style pivot table as a DataFrame.\n DataFrame.pivot : Return reshaped DataFrame organized\n by given index / column values.\n DataFrame.explode : Explode a DataFrame from list-like\n columns to long format.\n\n Notes\n -----\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": {0: \"a\", 1: \"b\", 2: \"c\"},\n ... \"B\": {0: 1, 1: 3, 2: 5},\n ... \"C\": {0: 2, 1: 4, 2: 6},\n ... }\n ... )\n >>> df\n A B C\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 3 a C 2\n 4 b C 4\n 5 c C 6\n\n The names of 'variable' and 'value' columns can be customized:\n\n >>> df.melt(\n ... id_vars=[\"A\"],\n ... value_vars=[\"B\"],\n ... var_name=\"myVarname\",\n ... value_name=\"myValname\",\n ... )\n A myVarname myValname\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n Original index values can be kept around:\n\n >>> df.melt(id_vars=[\"A\"], value_vars=[\"B\", \"C\"], ignore_index=False)\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n 0 a C 2\n 1 b C 4\n 2 c C 6\n\n If you have multi-index columns:\n\n >>> df.columns = [list(\"ABC\"), list(\"DEF\")]\n >>> df\n A B C\n D E F\n 0 a 1 2\n 1 b 3 4\n 2 c 5 6\n\n >>> df.melt(col_level=0, id_vars=[\"A\"], value_vars=[\"B\"])\n A variable value\n 0 a B 1\n 1 b B 3\n 2 c B 5\n\n >>> df.melt(id_vars=[(\"A\", \"D\")], value_vars=[(\"B\", \"E\")])\n (A, D) variable_0 variable_1 value\n 0 a B E 1\n 1 b B E 3\n 2 c B E 5\n \"\"\"\n return melt(\n self,\n id_vars=id_vars,\n value_vars=value_vars,\n var_name=var_name,\n value_name=value_name,\n col_level=col_level,\n ignore_index=ignore_index,\n ).__finalize__(self, method=\"melt\")\n\n # ----------------------------------------------------------------------\n # Time series-related\n\n def diff(self, periods: int = 1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n First discrete difference of element.\n\n Calculates the difference of a DataFrame element compared with another\n element in the DataFrame (default is element in previous row).\n\n Parameters\n ----------\n periods : int, default 1\n Periods to shift for calculating difference, accepts negative\n values.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Take difference over rows (0) or columns (1).\n\n Returns\n -------\n DataFrame\n First differences of the Series.\n\n See Also\n --------\n DataFrame.pct_change: Percent change over given number of periods.\n DataFrame.shift: Shift index by desired number of periods with an\n optional time freq.\n Series.diff: First discrete difference of object.\n\n Notes\n -----\n For boolean dtypes, this uses :meth:`operator.xor` rather than\n :meth:`operator.sub`.\n The result is calculated according to current dtype in DataFrame,\n however dtype of the result is always float64.\n\n Examples\n --------\n\n Difference with previous row\n\n >>> df = pd.DataFrame(\n ... {\n ... \"a\": [1, 2, 3, 4, 5, 6],\n ... \"b\": [1, 1, 2, 3, 5, 8],\n ... \"c\": [1, 4, 9, 16, 25, 36],\n ... }\n ... )\n >>> df\n a b c\n 0 1 1 1\n 1 2 1 4\n 2 3 2 9\n 3 4 3 16\n 4 5 5 25\n 5 6 8 36\n >>> df.diff()\n a b c\n 0 NaN NaN NaN\n 1 1.0 0.0 3.0\n 2 1.0 1.0 5.0\n 3 1.0 1.0 7.0\n 4 1.0 2.0 9.0\n 5 1.0 3.0 11.0\n\n Difference with previous column\n\n >>> df.diff(axis=1)\n a b c\n 0 NaN 0 0\n 1 NaN -1 3\n 2 NaN -1 7\n 3 NaN -1 13\n 4 NaN 0 20\n 5 NaN 2 28\n\n Difference with 3rd previous row\n\n >>> df.diff(periods=3)\n a b c\n 0 NaN NaN NaN\n 1 NaN NaN NaN\n 2 NaN NaN NaN\n 3 3.0 2.0 15.0\n 4 3.0 4.0 21.0\n 5 3.0 6.0 27.0\n\n Difference with following row\n\n >>> df.diff(periods=-1)\n a b c\n 0 -1.0 0.0 -3.0\n 1 -1.0 -1.0 -5.0\n 2 -1.0 -1.0 -7.0\n 3 -1.0 -2.0 -9.0\n 4 -1.0 -3.0 -11.0\n 5 NaN NaN NaN\n\n Overflow in input dtype\n\n >>> df = pd.DataFrame({\"a\": [1, 0]}, dtype=np.uint8)\n >>> df.diff()\n a\n 0 NaN\n 1 255.0\n \"\"\"\n if not lib.is_integer(periods):\n if not (is_float(periods) and periods.is_integer()):\n raise ValueError(\"periods must be an integer\")\n periods = int(periods)\n\n axis = self._get_axis_number(axis)\n if axis == 1:\n if periods != 0:\n # in the periods == 0 case, this is equivalent diff of 0 periods\n # along axis=0, and the Manager method may be somewhat more\n # performant, so we dispatch in that case.\n return self - self.shift(periods, axis=axis)\n # With periods=0 this is equivalent to a diff with axis=0\n axis = 0\n\n new_data = self._mgr.diff(n=periods)\n res_df = self._constructor_from_mgr(new_data, axes=new_data.axes)\n return res_df.__finalize__(self, \"diff\")\n\n # ----------------------------------------------------------------------\n # Function application\n\n def _gotitem(\n self,\n key: IndexLabel,\n ndim: int,\n subset: DataFrame | Series | None = None,\n ) -> DataFrame | Series:\n \"\"\"\n Sub-classes to define. Return a sliced object.\n\n Parameters\n ----------\n key : string / list of selections\n ndim : {1, 2}\n requested ndim of result\n subset : object, default None\n subset to act on\n \"\"\"\n if subset is None:\n subset = self\n elif subset.ndim == 1: # is Series\n return subset\n\n return subset[key]\n\n def aggregate(\n self, func=None, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame | Series:\n \"\"\"\n Aggregate using one or more operations over the specified axis.\n\n This method allows combining multiple aggregation functions at once,\n such as ``sum``, ``mean``, and ``min``, and can apply them either\n per-column or per-row. It accepts functions as strings, callables,\n lists, or dictionaries mapping column labels to the desired\n aggregation(s).\n\n Parameters\n ----------\n func : function, str, list or dict\n Function to use for aggregating the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list of functions and/or function names, e.g. ``[np.sum, 'mean']``\n - dict of axis labels -> functions, function names or list of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n scalar, Series or DataFrame\n\n The return can be:\n\n * scalar : when Series.agg is called with single function\n * Series : when DataFrame.agg is called with a single function\n * DataFrame : when DataFrame.agg is called with several functions\n\n See Also\n --------\n DataFrame.apply : Perform any type of operations.\n DataFrame.transform : Perform transformation type operations.\n DataFrame.groupby : Perform operations over groups.\n DataFrame.resample : Perform operations over resampled bins.\n DataFrame.rolling : Perform operations over rolling window.\n DataFrame.expanding : Perform operations over expanding window.\n core.window.ewm.ExponentialMovingWindow : Perform operation over exponential\n weighted window.\n\n Notes\n -----\n The aggregation operations are always performed over an axis, either the\n index (default) or the column axis. This behavior is different from\n `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,\n `var`), where the default is to compute the aggregation of the flattened\n array, e.g., ``numpy.mean(arr_2d)`` as opposed to\n ``numpy.mean(arr_2d, axis=0)``.\n\n `agg` is an alias for `aggregate`. Use the alias.\n\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n A passed user-defined-function will be passed a Series for evaluation.\n\n If ``func`` defines an index relabeling, ``axis`` must be ``0`` or ``index``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]],\n ... columns=[\"A\", \"B\", \"C\"],\n ... )\n\n Aggregate these functions over the rows.\n\n >>> df.agg([\"sum\", \"min\"])\n A B C\n sum 12.0 15.0 18.0\n min 1.0 2.0 3.0\n\n Different aggregations per column.\n\n >>> df.agg({\"A\": [\"sum\", \"min\"], \"B\": [\"min\", \"max\"]})\n A B\n sum 12.0 NaN\n min 1.0 2.0\n max NaN 8.0\n\n Aggregate different functions over the columns and rename the index of\n the resulting DataFrame.\n\n >>> df.agg(x=(\"A\", \"max\"), y=(\"B\", \"min\"), z=(\"C\", \"mean\"))\n A B C\n x 7.0 NaN NaN\n y NaN 2.0 NaN\n z NaN NaN 6.0\n\n Aggregate over the columns.\n\n >>> df.agg(\"mean\", axis=\"columns\")\n 0 2.0\n 1 5.0\n 2 8.0\n 3 NaN\n dtype: float64\n \"\"\"\n from pandas.core.apply import frame_apply\n\n axis = self._get_axis_number(axis)\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.agg()\n result = reconstruct_and_relabel_result(result, func, **kwargs)\n return result\n\n agg = aggregate\n\n def transform(\n self, func: AggFuncType, axis: Axis = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Call ``func`` on self producing a DataFrame with the same axis shape as self.\n\n Unlike aggregation, transformation preserves the shape of the input.\n The provided function must return a result that is the same size as\n the input along the specified axis, raising a ``ValueError`` otherwise.\n\n Parameters\n ----------\n func : function, str, list-like or dict-like\n Function to use for transforming the data. If a function, must either\n work when passed a DataFrame or when passed to DataFrame.apply. If func\n is both list-like and dict-like, dict-like behavior takes precedence.\n\n Accepted combinations are:\n\n - function\n - string function name\n - list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``\n - dict-like of axis labels -> functions, function names or list-like\n of such.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index': apply function to each column.\n If 1 or 'columns': apply function to each row.\n *args\n Positional arguments to pass to `func`.\n **kwargs\n Keyword arguments to pass to `func`.\n\n Returns\n -------\n DataFrame\n A DataFrame that must have the same length as self.\n\n Raises\n ------\n ValueError : If the returned DataFrame has a different length than self.\n\n See Also\n --------\n DataFrame.agg : Only perform aggregating type operations.\n DataFrame.apply : Invoke function on a DataFrame.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": range(3), \"B\": range(1, 4)})\n >>> df\n A B\n 0 0 1\n 1 1 2\n 2 2 3\n >>> df.transform(lambda x: x + 1)\n A B\n 0 1 2\n 1 2 3\n 2 3 4\n\n Even though the resulting DataFrame must have the same length as the\n input DataFrame, it is possible to provide several input functions:\n\n >>> s = pd.Series(range(3))\n >>> s\n 0 0\n 1 1\n 2 2\n dtype: int64\n >>> s.transform([np.sqrt, np.exp])\n sqrt exp\n 0 0.000000 1.000000\n 1 1.000000 2.718282\n 2 1.414214 7.389056\n\n You can call transform on a GroupBy object:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Date\": [\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... \"2015-05-08\",\n ... \"2015-05-07\",\n ... \"2015-05-06\",\n ... \"2015-05-05\",\n ... ],\n ... \"Data\": [5, 8, 6, 1, 50, 100, 60, 120],\n ... }\n ... )\n >>> df\n Date Data\n 0 2015-05-08 5\n 1 2015-05-07 8\n 2 2015-05-06 6\n 3 2015-05-05 1\n 4 2015-05-08 50\n 5 2015-05-07 100\n 6 2015-05-06 60\n 7 2015-05-05 120\n >>> df.groupby(\"Date\")[\"Data\"].transform(\"sum\")\n 0 55\n 1 108\n 2 66\n 3 121\n 4 55\n 5 108\n 6 66\n 7 121\n Name: Data, dtype: int64\n\n >>> df = pd.DataFrame(\n ... {\n ... \"c\": [1, 1, 1, 2, 2, 2, 2],\n ... \"type\": [\"m\", \"n\", \"o\", \"m\", \"m\", \"n\", \"n\"],\n ... }\n ... )\n >>> df\n c type\n 0 1 m\n 1 1 n\n 2 1 o\n 3 2 m\n 4 2 m\n 5 2 n\n 6 2 n\n >>> df[\"size\"] = df.groupby(\"c\")[\"type\"].transform(len)\n >>> df\n c type size\n 0 1 m 3\n 1 1 n 3\n 2 1 o 3\n 3 2 m 4\n 4 2 m 4\n 5 2 n 4\n 6 2 n 4\n \"\"\"\n from pandas.core.apply import frame_apply\n\n op = frame_apply(self, func=func, axis=axis, args=args, kwargs=kwargs)\n result = op.transform()\n assert isinstance(result, DataFrame)\n return result\n\n def apply(\n self,\n func: AggFuncType,\n axis: Axis = 0,\n raw: bool = False,\n result_type: Literal[\"expand\", \"reduce\", \"broadcast\"] | None = None,\n args=(),\n by_row: Literal[False, \"compat\"] = \"compat\",\n engine: Callable | None | Literal[\"python\", \"numba\"] = None,\n engine_kwargs: dict[str, bool] | None = None,\n **kwargs,\n ):\n \"\"\"\n Apply a function along an axis of the DataFrame.\n\n Objects passed to the function are Series objects whose index is\n either the DataFrame's index (``axis=0``) or the DataFrame's columns\n (``axis=1``). However, by default (``by_row=\"compat\"``), if ``func``\n is a list-like or dict-like of functions, each function is first\n applied to the individual values of the Series rather than the Series\n itself; if this fails, pandas retries by passing the entire Series.\n By default (``result_type=None``), the final return type is inferred\n from the return type of the applied function. Otherwise, it depends\n on the `result_type` argument. The return type of the applied function\n is inferred based on the first computed result obtained after applying\n the function to a Series object.\n\n Parameters\n ----------\n func : function\n Function to apply to each column or row.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis along which the function is applied:\n\n * 0 or 'index': apply function to each column.\n * 1 or 'columns': apply function to each row.\n\n raw : bool, default False\n Determines if row or column is passed as a Series or ndarray object:\n\n * ``False`` : passes each row or column as a Series to the\n function.\n * ``True`` : the passed function will receive ndarray objects\n instead.\n If you are just applying a NumPy reduction function this will\n achieve much better performance.\n\n .. note::\n\n When ``raw=True``, the result dtype is inferred from the **first**\n returned value.\n\n result_type : {'expand', 'reduce', 'broadcast', None}, default None\n How to interpret list-like results from `func`:\n\n * 'expand' : list-like results will be turned into columns.\n * 'reduce' : returns a Series if possible rather than expanding\n list-like results. This is the opposite of 'expand'.\n * 'broadcast' : results will be broadcast to the original shape\n of the DataFrame, the original index and columns will be\n retained.\n\n The default behaviour (None) depends on the return value of the\n applied function: list-like results will be returned as a Series\n of those. However if the apply function returns a Series these\n are expanded to columns.\n\n .. note::\n\n ``result_type`` has no effect when ``func`` is a NumPy\n universal function (e.g. ``np.sqrt``). In that case the\n ufunc is applied directly to the underlying values and the\n result has the same shape as the input, regardless of\n ``axis`` or ``result_type``. To use ``result_type`` with a\n ufunc, wrap it in a Python function (e.g.\n ``lambda x: np.sqrt(x)``).\n args : tuple\n Positional arguments to pass to `func` in addition to the\n array/series.\n by_row : False or \"compat\", default \"compat\"\n Only has an effect when ``func`` is a listlike or dictlike of funcs\n and the func isn't a string.\n If \"compat\", will if possible first translate the func into pandas\n methods (e.g. ``Series().apply(np.sum)`` will be translated to\n ``Series().sum()``). If that doesn't work, will try call to apply again with\n ``by_row=True`` and if that fails, will call apply again with\n ``by_row=False`` (backward compatible).\n If False, the funcs will be passed the whole Series at once.\n\n .. versionadded:: 2.1.0\n\n engine : decorator or {'python', 'numba'}, optional\n Choose the execution engine to use. If not provided the function\n will be executed by the regular Python interpreter.\n\n Other options include JIT compilers such as Numba and Bodo, which in some\n cases can speed up the execution. To use an executor you can provide\n the decorators ``numba.jit``, ``numba.njit`` or ``bodo.jit``. You can\n also provide the decorator with parameters, like ``numba.jit(nogil=True)``.\n\n Not all functions can be executed with all execution engines. In general,\n JIT compilers will require type stability in the function (no variable\n should change data type during the execution). And not all pandas and\n NumPy APIs are supported. Check the engine documentation [1]_ and [2]_\n for limitations.\n\n .. warning::\n\n String parameters will stop being supported in a future pandas version.\n\n .. versionadded:: 2.2.0\n\n engine_kwargs : dict\n Pass keyword arguments to the engine.\n This is currently only used by the numba engine,\n see the documentation for the engine argument for more information.\n\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n Series or DataFrame\n Result of applying ``func`` along the given axis of the\n DataFrame.\n\n See Also\n --------\n DataFrame.map: For elementwise operations.\n DataFrame.aggregate: Only perform aggregating type operations.\n DataFrame.transform: Only perform transforming type operations.\n\n Notes\n -----\n Functions that mutate the passed object can produce unexpected\n behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`\n for more details.\n\n References\n ----------\n .. [1] `Numba documentation\n `_\n .. [2] `Bodo documentation\n `/\n\n Examples\n --------\n >>> df = pd.DataFrame([[4, 9]] * 3, columns=[\"A\", \"B\"])\n >>> df\n A B\n 0 4 9\n 1 4 9\n 2 4 9\n\n Using a numpy universal function (in this case the same as\n ``np.sqrt(df)``):\n\n >>> df.apply(np.sqrt)\n A B\n 0 2.0 3.0\n 1 2.0 3.0\n 2 2.0 3.0\n\n Using a reducing function on either axis\n\n >>> df.apply(np.sum, axis=0)\n A 12\n B 27\n dtype: int64\n\n >>> df.apply(np.sum, axis=1)\n 0 13\n 1 13\n 2 13\n dtype: int64\n\n Returning a list-like will result in a Series\n\n >>> df.apply(lambda x: [1, 2], axis=1)\n 0 [1, 2]\n 1 [1, 2]\n 2 [1, 2]\n dtype: object\n\n Passing ``result_type='expand'`` will expand list-like results\n to columns of a Dataframe\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"expand\")\n 0 1\n 0 1 2\n 1 1 2\n 2 1 2\n\n Returning a Series inside the function is similar to passing\n ``result_type='expand'``. The resulting column names\n will be the Series index.\n\n >>> df.apply(lambda x: pd.Series([1, 2], index=[\"foo\", \"bar\"]), axis=1)\n foo bar\n 0 1 2\n 1 1 2\n 2 1 2\n\n Passing ``result_type='broadcast'`` will ensure the same shape\n result, whether list-like or scalar is returned by the function,\n and broadcast it along the axis. The resulting column names will\n be the originals.\n\n >>> df.apply(lambda x: [1, 2], axis=1, result_type=\"broadcast\")\n A B\n 0 1 2\n 1 1 2\n 2 1 2\n\n Advanced users can speed up their code by using a Just-in-time (JIT) compiler\n with ``apply``. The main JIT compilers available for pandas are Numba and Bodo.\n In general, JIT compilation is only possible when the function passed to\n ``apply`` has type stability (variables in the function do not change their\n type during the execution).\n\n >>> import bodo # doctest: +SKIP\n >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit) # doctest: +SKIP\n\n Note that JIT compilation is only recommended for functions that take a\n significant amount of time to run. Fast functions are unlikely to run faster\n with JIT compilation.\n \"\"\"\n if engine is None or isinstance(engine, str):\n from pandas.core.apply import frame_apply\n\n if engine is None:\n engine = \"python\"\n\n if engine not in [\"python\", \"numba\"]:\n raise ValueError(f\"Unknown engine '{engine}'\")\n\n op = frame_apply(\n self,\n func=func,\n axis=axis,\n raw=raw,\n result_type=result_type,\n by_row=by_row,\n engine=engine,\n engine_kwargs=engine_kwargs,\n args=args,\n kwargs=kwargs,\n )\n return op.apply().__finalize__(self, method=\"apply\")\n elif hasattr(engine, \"__pandas_udf__\"):\n if result_type is not None:\n raise NotImplementedError(\n f\"{result_type=} only implemented for the default engine\"\n )\n\n agg_axis = self._get_agg_axis(self._get_axis_number(axis))\n\n # one axis is empty\n if not all(self.shape):\n func = cast(\"Callable\", func)\n try:\n if axis == 0:\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = func(\n Series(index=self.columns, dtype=np.float64),\n *args,\n **kwargs,\n )\n except Exception:\n pass\n else:\n if not isinstance(r, Series):\n if len(agg_axis):\n r = func(Series([], dtype=np.float64), *args, **kwargs)\n else:\n r = np.nan\n\n return self._constructor_sliced(r, index=agg_axis)\n return self.copy()\n\n data: DataFrame | np.ndarray = self\n if raw:\n # This will upcast the whole DataFrame to the same type,\n # and likely result in an object 2D array.\n # We should probably pass a list of 1D arrays instead, at\n # lest for ``axis=0``\n data = self.values\n result = engine.__pandas_udf__.apply(\n data=data,\n func=func,\n args=args,\n kwargs=kwargs,\n decorator=engine,\n axis=axis,\n )\n if raw:\n if result.ndim == 2:\n return self._constructor(\n result, index=self.index, columns=self.columns\n )\n else:\n return self._constructor_sliced(result, index=agg_axis)\n return result\n else:\n raise ValueError(f\"Unknown engine {engine}\")\n\n def map(\n self, func: PythonFuncType, na_action: Literal[\"ignore\"] | None = None, **kwargs\n ) -> DataFrame:\n \"\"\"\n Apply a function to a Dataframe elementwise.\n\n .. versionadded:: 2.1.0\n\n DataFrame.applymap was deprecated and renamed to DataFrame.map.\n\n This method applies a function that accepts and returns a scalar\n to every element of a DataFrame.\n\n Parameters\n ----------\n func : callable\n Python function, returns a single value from a single value.\n na_action : {None, 'ignore'}, default None\n If 'ignore', propagate NaN values, without passing them to func.\n **kwargs\n Additional keyword arguments to pass as keywords arguments to\n `func`.\n\n Returns\n -------\n DataFrame\n Transformed DataFrame.\n\n See Also\n --------\n DataFrame.apply : Apply a function along input axis of DataFrame.\n DataFrame.replace: Replace values given in `to_replace` with `value`.\n Series.map : Apply a function elementwise on a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])\n >>> df\n 0 1\n 0 1.000 2.120\n 1 3.356 4.567\n\n >>> df.map(lambda x: len(str(x)))\n 0 1\n 0 3 4\n 1 5 5\n\n Like Series.map, NA values can be ignored:\n\n >>> df_copy = df.copy()\n >>> df_copy.iloc[0, 0] = pd.NA\n >>> df_copy.map(lambda x: len(str(x)), na_action=\"ignore\")\n 0 1\n 0 NaN 4\n 1 5.0 5\n\n It is also possible to use `map` with functions that are not\n `lambda` functions:\n\n >>> df.map(round, ndigits=1)\n 0 1\n 0 1.0 2.1\n 1 3.4 4.6\n\n Note that a vectorized version of `func` often exists, which will\n be much faster. You could square each number elementwise.\n\n >>> df.map(lambda x: x**2)\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n\n But it's better to avoid map in that case.\n\n >>> df**2\n 0 1\n 0 1.000000 4.494400\n 1 11.262736 20.857489\n \"\"\"\n if na_action not in {\"ignore\", None}:\n raise ValueError(f\"na_action must be 'ignore' or None. Got {na_action!r}\")\n\n if self.empty:\n return self.copy()\n\n func = functools.partial(func, **kwargs)\n\n def infer(x):\n return x._map_values(func, na_action=na_action)\n\n return self.apply(infer).__finalize__(self, \"map\")\n\n # ----------------------------------------------------------------------\n # Merging / joining methods\n\n def _append_internal(\n self,\n other: Series,\n ignore_index: bool = False,\n ) -> DataFrame:\n assert isinstance(other, Series), type(other)\n\n if other.name is None and not ignore_index:\n raise TypeError(\n \"Can only append a Series if ignore_index=True \"\n \"or if the Series has a name\"\n )\n\n index = Index(\n [other.name],\n name=(\n self.index.names\n if isinstance(self.index, MultiIndex)\n else self.index.name\n ),\n )\n\n row_df = other.to_frame().T\n if isinstance(self.index.dtype, ExtensionDtype):\n # GH#41626 retain e.g. CategoricalDtype if reached via\n # df.loc[key] = item\n row_df.index = self.index.array._cast_pointwise_result(row_df.index._values)\n\n # infer_objects is needed for\n # test_append_empty_frame_to_series_with_dateutil_tz\n row_df = row_df.infer_objects().rename_axis(index.names)\n\n if len(row_df.columns) == len(self.columns):\n # Pre-cast the row's value to the original column dtype where the\n # row's inferred dtype would otherwise force concat to widen the\n # whole column. This avoids an O(N) materialize-and-rebuild\n # roundtrip in _post_expansion_casting, and (for EA dtypes that\n # carry array-level state not encoded in the dtype, e.g. geopandas\n # CRS) preserves that state through concat. GH#65094.\n orig_dtypes = self._mgr.get_dtypes()\n row_dtypes = row_df._mgr.get_dtypes()\n object_dtype = np.dtype(object)\n for i in range(len(self.columns)):\n orig_dtype = orig_dtypes[i]\n if row_dtypes[i] == orig_dtype:\n continue\n if orig_dtype == object_dtype:\n # concat object + anything stays object; post-cast is a\n # no-op, so pre-casting would only add overhead.\n continue\n arr = self._get_column_array(i)\n if isinstance(arr, np.ndarray):\n # infer_and_maybe_downcast expects an EA as its first\n # argument so it can dispatch to _cast_pointwise_result.\n arr = NumpyExtensionArray(arr)\n casted = infer_and_maybe_downcast(arr, row_df._mgr.iget_values(i))\n row_df.isetitem(i, casted)\n\n from pandas.core.reshape.concat import concat\n\n result = concat(\n [self, row_df],\n ignore_index=ignore_index,\n )\n return result.__finalize__(self, method=\"append\")\n\n def join(\n self,\n other: DataFrame | Series | Iterable[DataFrame | Series],\n on: IndexLabel | None = None,\n how: MergeHow = \"left\",\n lsuffix: str = \"\",\n rsuffix: str = \"\",\n sort: bool = False,\n validate: JoinValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Join columns of another DataFrame.\n\n Join columns with `other` DataFrame either on index or on a key\n column. Efficiently join multiple DataFrame objects by index at once by\n passing a list.\n\n Parameters\n ----------\n other : DataFrame, Series, or a list containing any combination of them\n Index should be similar to one of the columns in the caller. If a\n Series is passed, its name attribute must be set, and that will be\n used as the column name in the resulting joined DataFrame.\n on : str, list of str, or array-like, optional\n Column or index level name(s) in the caller to join on the index\n in `other`, otherwise joins index-on-index. If multiple\n values given, the `other` DataFrame must have a MultiIndex. Can\n pass an array as the join key if it is not already contained in\n the calling DataFrame. Like an Excel VLOOKUP operation.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'left'\n How to handle the operation of the two objects.\n\n * left: use calling frame's index (or column if on is specified)\n * right: use `other`'s index.\n * outer: form union of calling frame's index (or column if on is\n specified) with `other`'s index, and sort it lexicographically.\n * inner: form intersection of calling frame's index (or column if\n on is specified) with `other`'s index, preserving the order\n of the calling's one.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use set difference of calling frame's index and `other`'s\n index.\n * right_anti: use set difference of `other`'s index and calling frame's\n index.\n lsuffix : str, default ''\n Suffix to use from left frame's overlapping columns.\n rsuffix : str, default ''\n Suffix to use from right frame's overlapping columns.\n sort : bool, default False\n Order result DataFrame lexicographically by the join key. If False,\n the order of the join key depends on the join type (how keyword).\n validate : str, optional\n If specified, checks if join is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if join keys are unique in both left\n and right datasets.\n * \"one_to_many\" or \"1:m\": check if join keys are unique in left dataset.\n * \"many_to_one\" or \"m:1\": check if join keys are unique in right dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A dataframe containing columns from both the caller and `other`.\n\n See Also\n --------\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n Parameters `on`, `lsuffix`, and `rsuffix` are not supported when\n passing a list of `DataFrame` objects.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K2\", \"K3\", \"K4\", \"K5\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K2 A2\n 3 K3 A3\n 4 K4 A4\n 5 K5 A5\n\n >>> other = pd.DataFrame({\"key\": [\"K0\", \"K1\", \"K2\"], \"B\": [\"B0\", \"B1\", \"B2\"]})\n\n >>> other\n key B\n 0 K0 B0\n 1 K1 B1\n 2 K2 B2\n\n Join DataFrames using their indexes.\n\n >>> df.join(other, lsuffix=\"_caller\", rsuffix=\"_other\")\n key_caller A key_other B\n 0 K0 A0 K0 B0\n 1 K1 A1 K1 B1\n 2 K2 A2 K2 B2\n 3 K3 A3 NaN NaN\n 4 K4 A4 NaN NaN\n 5 K5 A5 NaN NaN\n\n If we want to join using the key columns, we need to set key to be\n the index in both `df` and `other`. The joined DataFrame will have\n key as its index.\n\n >>> df.set_index(\"key\").join(other.set_index(\"key\"))\n A B\n key\n K0 A0 B0\n K1 A1 B1\n K2 A2 B2\n K3 A3 NaN\n K4 A4 NaN\n K5 A5 NaN\n\n Another option to join using the key columns is to use the `on`\n parameter. DataFrame.join always uses `other`'s index but we can use\n any column in `df`. This method preserves the original DataFrame's\n index in the result.\n\n >>> df.join(other.set_index(\"key\"), on=\"key\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K2 A2 B2\n 3 K3 A3 NaN\n 4 K4 A4 NaN\n 5 K5 A5 NaN\n\n Using non-unique key values shows how they are matched.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"key\": [\"K0\", \"K1\", \"K1\", \"K3\", \"K0\", \"K1\"],\n ... \"A\": [\"A0\", \"A1\", \"A2\", \"A3\", \"A4\", \"A5\"],\n ... }\n ... )\n\n >>> df\n key A\n 0 K0 A0\n 1 K1 A1\n 2 K1 A2\n 3 K3 A3\n 4 K0 A4\n 5 K1 A5\n\n >>> df.join(other.set_index(\"key\"), on=\"key\", validate=\"m:1\")\n key A B\n 0 K0 A0 B0\n 1 K1 A1 B1\n 2 K1 A2 B1\n 3 K3 A3 NaN\n 4 K0 A4 B0\n 5 K1 A5 B1\n \"\"\"\n from pandas.core.reshape.concat import concat\n from pandas.core.reshape.merge import merge\n\n if isinstance(other, Series):\n if other.name is None:\n raise ValueError(\"Other Series must have a name\")\n other = DataFrame({other.name: other})\n\n if isinstance(other, DataFrame):\n if how == \"cross\":\n return merge(\n self,\n other,\n how=how,\n on=on,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n return merge(\n self,\n other,\n left_on=on,\n how=how,\n left_index=on is None,\n right_index=True,\n suffixes=(lsuffix, rsuffix),\n sort=sort,\n validate=validate,\n )\n else:\n if on is not None:\n raise ValueError(\n \"Joining multiple DataFrames only supported for joining on index\"\n )\n\n if rsuffix or lsuffix:\n raise ValueError(\n \"Suffixes not supported when joining multiple DataFrames\"\n )\n\n # Mypy thinks the RHS is a\n # \"Union[DataFrame, Series, Iterable[Union[DataFrame, Series]]]\" whereas\n # the LHS is an \"Iterable[DataFrame]\", but in reality both types are\n # \"Iterable[Union[DataFrame, Series]]\" due to the if statements\n frames = [cast(\"DataFrame | Series\", self), *list(other)]\n\n can_concat = all(df.index.is_unique for df in frames)\n\n # join indexes only using concat\n if can_concat:\n if how in {\"left\", \"right\"}:\n res = concat(\n frames, axis=1, join=\"outer\", verify_integrity=True, sort=sort\n )\n index = self.index if how == \"left\" else frames[-1].index\n if sort:\n index = index.sort_values()\n result = res.reindex(index)\n return result\n else:\n if how == \"outer\":\n sort = True\n return concat(\n frames, axis=1, join=how, verify_integrity=True, sort=sort\n )\n\n joined = frames[0]\n\n for frame in frames[1:]:\n joined = merge(\n joined,\n frame,\n sort=sort,\n how=how,\n left_index=True,\n right_index=True,\n validate=validate,\n )\n\n return joined\n\n def merge(\n self,\n right: DataFrame | Series,\n how: MergeHow = \"inner\",\n on: IndexLabel | AnyArrayLike | None = None,\n left_on: IndexLabel | AnyArrayLike | None = None,\n right_on: IndexLabel | AnyArrayLike | None = None,\n left_index: bool = False,\n right_index: bool = False,\n sort: bool = False,\n suffixes: Suffixes = (\"_x\", \"_y\"),\n copy: bool | lib.NoDefault = lib.no_default,\n indicator: str | bool = False,\n validate: MergeValidate | None = None,\n ) -> DataFrame:\n \"\"\"\n Merge DataFrame or named Series objects with a database-style join.\n\n A named Series object is treated as a DataFrame with a single named column.\n\n The join is done on columns or indexes. If joining columns on\n columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes\n on indexes or indexes on a column or columns, the index will be passed on.\n When performing a cross merge, no column specifications to merge on are\n allowed.\n\n .. warning::\n\n If both key columns contain rows where the key is a null value, those\n rows will be matched against each other. This is different from usual SQL\n join behaviour and can lead to unexpected results.\n\n Parameters\n ----------\n right : DataFrame or named Series\n Object to merge with.\n how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},\n default 'inner'\n Type of merge to be performed.\n\n * left: use only keys from left frame, similar to a SQL left outer join;\n preserve key order.\n * right: use only keys from right frame, similar to a SQL right outer join;\n preserve key order.\n * outer: use union of keys from both frames, similar to a SQL full outer\n join; sort keys lexicographically.\n * inner: use intersection of keys from both frames, similar to a SQL inner\n join; preserve the order of the left keys.\n * cross: creates the cartesian product from both frames, preserves the order\n of the left keys.\n * left_anti: use only keys from left frame that are not in right frame,\n similar to SQL left anti join; preserve key order.\n\n .. versionadded:: 3.0\n * right_anti: use only keys from right frame that are not in left frame,\n similar to SQL right anti join; preserve key order.\n\n .. versionadded:: 3.0\n on : Hashable or a sequence of the previous\n Column or index level names to join on. These must be found in both\n DataFrames. If `on` is None and not merging on indexes then this defaults\n to the intersection of the columns in both DataFrames.\n left_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the left DataFrame. Can also\n be an array or list of arrays of the length of the left DataFrame.\n These arrays are treated as if they are columns.\n right_on : Hashable or a sequence of the previous, or array-like\n Column or index level names to join on in the right DataFrame. Can also\n be an array or list of arrays of the length of the right DataFrame.\n These arrays are treated as if they are columns.\n left_index : bool, default False\n Use the index from the left DataFrame as the join key(s). If it is a\n MultiIndex, the number of keys in the other DataFrame (either the index\n or a number of columns) must match the number of levels.\n right_index : bool, default False\n Use the index from the right DataFrame as the join key. Same caveats as\n left_index.\n sort : bool, default False\n Sort the join keys lexicographically in the result DataFrame. If False,\n the order of the join keys depends on the join type (how keyword).\n suffixes : list-like, default is (\"_x\", \"_y\")\n A length-2 sequence where each element is optionally a string\n indicating the suffix to add to overlapping column names in\n `left` and `right` respectively. Pass a value of `None` instead\n of a string to indicate that the column name from `left` or\n `right` should be left as-is, with no suffix. At least one of the\n values must not be None.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n indicator : bool or str, default False\n If True, adds a column to the output DataFrame called \"_merge\" with\n information on the source of each row. The column can be given a different\n name by providing a string argument. The column will have a Categorical\n type with the value of \"left_only\" for observations whose merge key only\n appears in the left DataFrame, \"right_only\" for observations\n whose merge key only appears in the right DataFrame, and \"both\"\n if the observation's merge key is found in both DataFrames.\n\n validate : str, optional\n If specified, checks if merge is of specified type.\n\n * \"one_to_one\" or \"1:1\": check if merge keys are unique in both\n left and right datasets.\n * \"one_to_many\" or \"1:m\": check if merge keys are unique in left\n dataset.\n * \"many_to_one\" or \"m:1\": check if merge keys are unique in right\n dataset.\n * \"many_to_many\" or \"m:m\": allowed, but does not result in checks.\n\n Returns\n -------\n DataFrame\n A DataFrame of the two merged objects.\n\n See Also\n --------\n merge_ordered : Merge with optional filling/interpolation.\n merge_asof : Merge on nearest keys.\n DataFrame.join : Similar method using indices.\n\n Examples\n --------\n >>> df1 = pd.DataFrame(\n ... {\"lkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [1, 2, 3, 5]}\n ... )\n >>> df2 = pd.DataFrame(\n ... {\"rkey\": [\"foo\", \"bar\", \"baz\", \"foo\"], \"value\": [5, 6, 7, 8]}\n ... )\n >>> df1\n lkey value\n 0 foo 1\n 1 bar 2\n 2 baz 3\n 3 foo 5\n >>> df2\n rkey value\n 0 foo 5\n 1 bar 6\n 2 baz 7\n 3 foo 8\n\n Merge df1 and df2 on the lkey and rkey columns. The value columns have\n the default suffixes, _x and _y, appended.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\")\n lkey value_x rkey value_y\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2 with specified left and right suffixes\n appended to any overlapping columns.\n\n >>> df1.merge(\n ... df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(\"_left\", \"_right\")\n ... )\n lkey value_left rkey value_right\n 0 foo 1 foo 5\n 1 foo 1 foo 8\n 2 bar 2 bar 6\n 3 baz 3 baz 7\n 4 foo 5 foo 5\n 5 foo 5 foo 8\n\n Merge DataFrames df1 and df2, but raise an exception if the DataFrames have\n any overlapping columns.\n\n >>> df1.merge(df2, left_on=\"lkey\", right_on=\"rkey\", suffixes=(False, False))\n Traceback (most recent call last):\n ...\n ValueError: columns overlap but no suffix specified:\n Index(['value'], dtype='object')\n\n >>> df1 = pd.DataFrame({\"a\": [\"foo\", \"bar\"], \"b\": [1, 2]})\n >>> df2 = pd.DataFrame({\"a\": [\"foo\", \"baz\"], \"c\": [3, 4]})\n >>> df1\n a b\n 0 foo 1\n 1 bar 2\n >>> df2\n a c\n 0 foo 3\n 1 baz 4\n\n >>> df1.merge(df2, how=\"inner\", on=\"a\")\n a b c\n 0 foo 1 3\n\n >>> df1.merge(df2, how=\"left\", on=\"a\")\n a b c\n 0 foo 1 3.0\n 1 bar 2 NaN\n\n >>> df1 = pd.DataFrame({\"left\": [\"foo\", \"bar\"]})\n >>> df2 = pd.DataFrame({\"right\": [7, 8]})\n >>> df1\n left\n 0 foo\n 1 bar\n >>> df2\n right\n 0 7\n 1 8\n\n >>> df1.merge(df2, how=\"cross\")\n left right\n 0 foo 7\n 1 foo 8\n 2 bar 7\n 3 bar 8\n \"\"\"\n self._check_copy_deprecation(copy)\n\n from pandas.core.reshape.merge import merge\n\n return merge(\n self,\n right,\n how=how,\n on=on,\n left_on=left_on,\n right_on=right_on,\n left_index=left_index,\n right_index=right_index,\n sort=sort,\n suffixes=suffixes,\n indicator=indicator,\n validate=validate,\n )\n\n def round(\n self, decimals: int | dict[IndexLabel, int] | Series = 0, *args, **kwargs\n ) -> DataFrame:\n \"\"\"\n Round numeric columns in a DataFrame to a variable number of decimal places.\n\n Each column can be rounded to a different number of decimal places by\n passing a dict or Series mapping column names to the desired precision.\n Non-numeric columns are left unchanged.\n\n Parameters\n ----------\n decimals : int, dict, Series\n Number of decimal places to round each column to. If an int is\n given, round each column to the same number of places.\n Otherwise dict and Series round to variable numbers of places.\n Column names should be in the keys if `decimals` is a\n dict-like, or in the index if `decimals` is a Series. Any\n columns not included in `decimals` will be left as is. Elements\n of `decimals` which are not columns of the input will be\n ignored.\n *args\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with numpy.\n\n Returns\n -------\n DataFrame\n A DataFrame with the affected columns rounded to the specified\n number of decimal places.\n\n See Also\n --------\n numpy.around : Round a numpy array to the given number of decimals.\n Series.round : Round a Series to the given number of decimals.\n\n Notes\n -----\n For values exactly halfway between rounded decimal values, pandas rounds\n to the nearest even value (e.g. -0.5 and 0.5 round to 0.0, 1.5 and 2.5\n round to 2.0, etc.).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(0.21, 0.32), (0.01, 0.67), (0.66, 0.03), (0.21, 0.18)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df\n dogs cats\n 0 0.21 0.32\n 1 0.01 0.67\n 2 0.66 0.03\n 3 0.21 0.18\n\n By providing an integer each column is rounded to the same number\n of decimal places\n\n >>> df.round(1)\n dogs cats\n 0 0.2 0.3\n 1 0.0 0.7\n 2 0.7 0.0\n 3 0.2 0.2\n\n With a dict, the number of places for specific columns can be\n specified with the column names as key and the number of decimal\n places as value\n\n >>> df.round({\"dogs\": 1, \"cats\": 0})\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n\n Using a Series, the number of places for specific columns can be\n specified with the column names as index and the number of\n decimal places as value\n\n >>> decimals = pd.Series([0, 1], index=[\"cats\", \"dogs\"])\n >>> df.round(decimals)\n dogs cats\n 0 0.2 0.0\n 1 0.0 1.0\n 2 0.7 0.0\n 3 0.2 0.0\n \"\"\"\n from pandas.core.reshape.concat import concat\n\n def _dict_round(df: DataFrame, decimals) -> Iterator[Series]:\n for col, vals in df.items():\n try:\n yield _series_round(vals, decimals[col])\n except KeyError:\n yield vals\n\n def _series_round(ser: Series, decimals: int) -> Series:\n if is_integer_dtype(ser.dtype) or is_float_dtype(ser.dtype):\n return ser.round(decimals)\n elif isinstance(ser._values, (DatetimeArray, TimedeltaArray, PeriodArray)):\n # GH#57781\n # TODO: also the ArrowDtype analogues?\n warnings.warn(\n \"obj.round has no effect with datetime, timedelta, \"\n \"or period dtypes. Use obj.dt.round(...) instead.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n return ser\n\n nv.validate_round(args, kwargs)\n\n if isinstance(decimals, (dict, Series)):\n if isinstance(decimals, Series) and not decimals.index.is_unique:\n raise ValueError(\"Index of decimals must be unique\")\n if is_dict_like(decimals) and not all(\n is_integer(value) for _, value in decimals.items()\n ):\n raise TypeError(\"Values in decimals must be integers\")\n new_cols = list(_dict_round(self, decimals))\n elif is_integer(decimals):\n # Dispatch to Block.round\n # Argument \"decimals\" to \"round\" of \"BaseBlockManager\" has incompatible\n # type \"Union[int, integer[Any]]\"; expected \"int\"\n new_mgr = self._mgr.round(\n decimals=decimals, # type: ignore[arg-type]\n )\n return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(\n self, method=\"round\"\n )\n else:\n raise TypeError(\"decimals must be an integer, a dict-like or a Series\")\n\n if new_cols is not None and len(new_cols) > 0:\n return self._constructor(\n concat(new_cols, axis=1), index=self.index, columns=self.columns\n ).__finalize__(self, method=\"round\")\n else:\n return self.copy(deep=False)\n\n # ----------------------------------------------------------------------\n # Statistical methods, etc.\n\n def describe(\n self,\n percentiles=None,\n include=None,\n exclude=None,\n ) -> DataFrame:\n \"\"\"\n Generate descriptive statistics.\n\n Summarize the central tendency, dispersion, and shape of each\n analyzed column's distribution, excluding ``NaN`` values. By\n default only numeric columns are analyzed; pass ``include`` to\n also analyze non-numeric columns (or ``exclude`` to omit columns\n by dtype).\n\n Parameters\n ----------\n percentiles : list-like of numbers, optional\n The percentiles to include in the output. All should fall\n between 0 and 1. The default, ``None``, returns the 25th,\n 50th, and 75th percentiles.\n include : 'all', list-like of dtypes or None (default), optional\n Which column dtypes to include. Options:\n\n - ``'all'`` : Include all columns, including non-numeric ones.\n - list-like of dtypes : Limit the result to columns of the\n given dtypes, in the style of\n :meth:`DataFrame.select_dtypes` (e.g. ``include=[np.number]``\n or ``include=[\"category\"]``).\n - ``None`` (default) : Include only numeric columns, falling\n back to object and categorical columns if there are no\n numeric columns.\n exclude : list-like of dtypes or None (default), optional\n Column dtypes to omit from the result, in the style of\n :meth:`DataFrame.select_dtypes`. ``None`` (default) excludes\n nothing.\n\n Returns\n -------\n DataFrame\n Summary statistics of the DataFrame's columns.\n\n See Also\n --------\n Series.describe : Generate descriptive statistics of a Series.\n DataFrame.count : Count of non-NA observations per column.\n DataFrame.max : Maximum of the values in each column.\n DataFrame.min : Minimum of the values in each column.\n DataFrame.mean : Mean of the values.\n DataFrame.std : Standard deviation of the observations.\n DataFrame.select_dtypes : Subset of a DataFrame including/excluding\n columns based on their dtype.\n\n Notes\n -----\n For numeric columns, the result's index includes ``count``,\n ``mean``, ``std``, ``min``, ``max``, and the requested\n percentiles. By default the lower percentile is ``25`` and the\n upper is ``75``; the ``50`` percentile is the same as the median.\n\n For object columns, the result's index includes ``count``,\n ``unique``, ``top``, and ``freq``. The ``top`` is the most common\n value and ``freq`` is its count. If multiple values tie for the\n highest count, ``top`` is chosen arbitrarily from among them.\n\n With ``include='all'``, the result's index is the union of the\n per-dtype indices, with ``NaN`` for statistics that do not apply\n to a given column's dtype.\n\n Examples\n --------\n By default, only numeric columns are analyzed.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"categorical\": pd.Categorical([\"d\", \"e\", \"f\"]),\n ... \"numeric\": [1, 2, 3],\n ... \"object\": [\"a\", \"b\", \"c\"],\n ... }\n ... )\n >>> df.describe()\n numeric\n count 3.0\n mean 2.0\n std 1.0\n min 1.0\n 25% 1.5\n 50% 2.0\n 75% 2.5\n max 3.0\n\n All columns regardless of dtype.\n\n >>> df.describe(include=\"all\") # doctest: +SKIP\n categorical numeric object\n count 3 3.0 3\n unique 3 NaN 3\n top f NaN a\n freq 1 NaN 1\n mean NaN 2.0 NaN\n std NaN 1.0 NaN\n min NaN 1.0 NaN\n 25% NaN 1.5 NaN\n 50% NaN 2.0 NaN\n 75% NaN 2.5 NaN\n max NaN 3.0 NaN\n\n Restrict the result to a specific dtype.\n\n >>> df.describe(include=[\"category\"])\n categorical\n count 3\n unique 3\n top d\n freq 1\n\n Exclude a specific dtype.\n\n >>> df.describe(exclude=[np.number]) # doctest: +SKIP\n categorical object\n count 3 3\n unique 3 3\n top f a\n freq 1 1\n \"\"\"\n return super().describe(\n percentiles=percentiles, include=include, exclude=exclude\n )\n\n def corr(\n self,\n method: CorrelationMethod = \"pearson\",\n min_periods: int = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise correlation of columns, excluding NA/null values.\n\n The result is a symmetric DataFrame where each element represents\n the correlation coefficient between two columns. By default, the\n Pearson correlation is computed, but Kendall and Spearman methods\n as well as arbitrary callables are also supported.\n\n Parameters\n ----------\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float. Note that the returned matrix from corr\n will have 1 along the diagonals and will be symmetric\n regardless of the callable's behavior.\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n Correlation matrix.\n\n See Also\n --------\n DataFrame.corrwith : Compute pairwise correlation with another\n DataFrame or Series.\n Series.corr : Compute the correlation between two Series.\n\n Notes\n -----\n Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.\n\n * `Pearson correlation coefficient `_\n * `Kendall rank correlation coefficient `_\n * `Spearman's rank correlation coefficient `_\n\n Examples\n --------\n >>> def histogram_intersection(a, b):\n ... v = np.minimum(a, b).sum().round(decimals=1)\n ... return v\n >>> df = pd.DataFrame(\n ... [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)],\n ... columns=[\"dogs\", \"cats\"],\n ... )\n >>> df.corr(method=histogram_intersection)\n dogs cats\n dogs 1.0 0.3\n cats 0.3 1.0\n\n >>> df = pd.DataFrame(\n ... [(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.corr(min_periods=3)\n dogs cats\n dogs 1.0 NaN\n cats NaN 1.0\n \"\"\" # noqa: E501\n data = self._get_numeric_data() if numeric_only else self\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if method == \"pearson\":\n correl = libalgos.nancorr(mat, minp=min_periods)\n elif method == \"spearman\":\n correl = libalgos.nancorr_spearman(mat, minp=min_periods)\n elif method == \"kendall\" or callable(method):\n if min_periods is None:\n min_periods = 1\n mat = mat.T\n corrf = nanops.get_corr_func(method)\n K = len(cols)\n correl = np.empty((K, K), dtype=float)\n mask = np.isfinite(mat)\n for i, ac in enumerate(mat):\n for j, bc in enumerate(mat):\n if i > j:\n continue\n\n valid = mask[i] & mask[j]\n if valid.sum() < min_periods:\n c = np.nan\n elif i == j:\n c = 1.0\n elif not valid.all():\n c = corrf(ac[valid], bc[valid])\n else:\n c = corrf(ac, bc)\n correl[i, j] = c\n correl[j, i] = c\n else:\n raise ValueError(\n \"method must be either 'pearson', \"\n \"'spearman', 'kendall', or a callable, \"\n f\"'{method}' was supplied\"\n )\n\n result = self._constructor(correl, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"corr\")\n\n def cov(\n self,\n min_periods: int | None = None,\n ddof: int | None = 1,\n numeric_only: bool = False,\n ) -> DataFrame:\n \"\"\"\n Compute pairwise covariance of columns, excluding NA/null values.\n\n Compute the pairwise covariance among the series of a DataFrame.\n The returned data frame is the `covariance matrix\n `__ of the columns\n of the DataFrame.\n\n Both NA and null values are automatically excluded from the\n calculation. (See the note below about bias from missing values.)\n A threshold can be set for the minimum number of\n observations for each value created. Comparisons with observations\n below this threshold will be returned as ``NaN``.\n\n This method is generally used for the analysis of time series data to\n understand the relationship between different measures\n across time.\n\n Parameters\n ----------\n min_periods : int, optional\n Minimum number of observations required per pair of columns\n to have a valid result.\n\n ddof : int, default 1\n Delta degrees of freedom. The divisor used in calculations\n is ``N - ddof``, where ``N`` represents the number of elements.\n This argument is applicable only when no ``nan`` is in the dataframe.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n DataFrame\n The covariance matrix of the series of the DataFrame.\n\n See Also\n --------\n Series.cov : Compute covariance with another Series.\n core.window.ewm.ExponentialMovingWindow.cov : Exponential weighted sample\n covariance.\n core.window.expanding.Expanding.cov : Expanding sample covariance.\n core.window.rolling.Rolling.cov : Rolling sample covariance.\n\n Notes\n -----\n Returns the covariance matrix of the DataFrame's time series.\n The covariance is normalized by N-ddof.\n\n For DataFrames that have Series that are missing data (assuming that\n data is `missing at random\n `__)\n the returned covariance matrix will be an unbiased estimate\n of the variance and covariance between the member Series.\n\n However, for many applications this estimate may not be acceptable\n because the estimate covariance matrix is not guaranteed to be positive\n semi-definite. This could lead to estimate correlations having\n absolute values which are greater than one, and/or a non-invertible\n covariance matrix. See `Estimation of covariance matrices\n `__ for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(1, 2), (0, 3), (2, 0), (1, 1)], columns=[\"dogs\", \"cats\"]\n ... )\n >>> df.cov()\n dogs cats\n dogs 0.666667 -1.000000\n cats -1.000000 1.666667\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(\n ... np.random.randn(1000, 5), columns=[\"a\", \"b\", \"c\", \"d\", \"e\"]\n ... )\n >>> df.cov()\n a b c d e\n a 0.998438 -0.020161 0.059277 -0.008943 0.014144\n b -0.020161 1.059352 -0.008543 -0.024738 0.009826\n c 0.059277 -0.008543 1.010670 -0.001486 -0.000271\n d -0.008943 -0.024738 -0.001486 0.921297 -0.013692\n e 0.014144 0.009826 -0.000271 -0.013692 0.977795\n\n **Minimum number of periods**\n\n This method also supports an optional ``min_periods`` keyword\n that specifies the required minimum number of non-NA observations for\n each column pair in order to have a valid result:\n\n >>> np.random.seed(42)\n >>> df = pd.DataFrame(np.random.randn(20, 3), columns=[\"a\", \"b\", \"c\"])\n >>> df.loc[df.index[:5], \"a\"] = np.nan\n >>> df.loc[df.index[5:10], \"b\"] = np.nan\n >>> df.cov(min_periods=12)\n a b c\n a 0.316741 NaN -0.150812\n b NaN 1.248003 0.191417\n c -0.150812 0.191417 0.895202\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n if any(blk.dtype.kind in \"mM\" for blk in self._mgr.blocks):\n msg = (\n \"DataFrame contains columns with dtype datetime64 \"\n \"or timedelta64, which are not supported for cov.\"\n )\n raise TypeError(msg)\n cols = data.columns\n idx = cols.copy()\n mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)\n\n if notna(mat).all():\n if min_periods is not None and min_periods > len(mat):\n base_cov = np.empty((mat.shape[1], mat.shape[1]))\n base_cov.fill(np.nan)\n else:\n base_cov = np.cov(mat.T, ddof=ddof)\n base_cov = base_cov.reshape((len(cols), len(cols)))\n else:\n base_cov = libalgos.nancorr(mat, cov=True, minp=min_periods)\n\n result = self._constructor(base_cov, index=idx, columns=cols, copy=False)\n return result.__finalize__(self, method=\"cov\")\n\n def corrwith(\n self,\n other: DataFrame | Series,\n axis: Axis = 0,\n drop: bool = False,\n method: CorrelationMethod = \"pearson\",\n numeric_only: bool = False,\n min_periods: int | None = None,\n ) -> Series:\n \"\"\"\n Compute pairwise correlation.\n\n Pairwise correlation is computed between rows or columns of\n DataFrame with rows or columns of Series or DataFrame. DataFrames\n are first aligned along both axes before computing the\n correlations.\n\n Parameters\n ----------\n other : DataFrame, Series\n Object with which to compute correlations.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' to compute row-wise, 1 or 'columns' for\n column-wise.\n drop : bool, default False\n Drop missing indices from result.\n method : {'pearson', 'kendall', 'spearman'} or callable\n Method of correlation:\n\n * pearson : standard correlation coefficient\n * kendall : Kendall Tau correlation coefficient\n * spearman : Spearman rank correlation\n * callable: callable with input two 1d ndarrays\n and returning a float.\n\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n min_periods : int, optional\n Minimum number of observations needed to have a valid result.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n Returns\n -------\n Series\n Pairwise correlations.\n\n See Also\n --------\n DataFrame.corr : Compute pairwise correlation of columns.\n\n Examples\n --------\n >>> index = [\"a\", \"b\", \"c\", \"d\", \"e\"]\n >>> columns = [\"one\", \"two\", \"three\", \"four\"]\n >>> df1 = pd.DataFrame(\n ... np.arange(20).reshape(5, 4), index=index, columns=columns\n ... )\n >>> df2 = pd.DataFrame(\n ... np.arange(16).reshape(4, 4), index=index[:4], columns=columns\n ... )\n >>> df1.corrwith(df2)\n one 1.0\n two 1.0\n three 1.0\n four 1.0\n dtype: float64\n\n >>> df2.corrwith(df1, axis=1)\n a 1.0\n b 1.0\n c 1.0\n d 1.0\n e NaN\n dtype: float64\n \"\"\"\n axis = self._get_axis_number(axis)\n this = self._get_numeric_data() if numeric_only else self\n\n if isinstance(other, Series):\n return this.apply(\n lambda x: other.corr(x, method=method, min_periods=min_periods),\n axis=axis,\n )\n\n if numeric_only:\n other = other._get_numeric_data()\n left, right = this.align(other, join=\"inner\")\n\n if axis == 1:\n left = left.T\n right = right.T\n\n if method == \"pearson\":\n # mask missing values\n left = left + right * 0\n right = right + left * 0\n\n # demeaned data\n ldem = left - left.mean(numeric_only=numeric_only)\n rdem = right - right.mean(numeric_only=numeric_only)\n\n num = (ldem * rdem).sum()\n dom = (\n (left.count() - 1)\n * left.std(numeric_only=numeric_only)\n * right.std(numeric_only=numeric_only)\n )\n\n correl = num / dom\n\n elif method in [\"kendall\", \"spearman\"] or callable(method):\n\n def c(x):\n return nanops.nancorr(x[0], x[1], method=method)\n\n correl = self._constructor_sliced(\n map(c, zip(left.values.T, right.values.T, strict=True)),\n index=left.columns,\n copy=False,\n )\n\n else:\n raise ValueError(\n f\"Invalid method {method} was passed, \"\n \"valid methods are: 'pearson', 'kendall', \"\n \"'spearman', or callable\"\n )\n\n if not drop:\n # Find non-matching labels along the given axis\n # and append missing correlations (GH 22375)\n raxis: AxisInt = 1 if axis == 0 else 0\n result_index = this._get_axis(raxis).union(other._get_axis(raxis))\n idx_diff = result_index.difference(correl.index)\n\n if len(idx_diff) > 0:\n correl = correl._append_internal(\n Series([np.nan] * len(idx_diff), index=idx_diff)\n )\n\n return correl\n\n # ----------------------------------------------------------------------\n # ndarray-like stats methods\n\n def count(self, axis: Axis = 0, numeric_only: bool = False) -> Series:\n \"\"\"\n Count non-NA cells for each column or row.\n\n The values `None`, `NaN`, `NaT`, ``pandas.NA`` are considered NA.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n If 0 or 'index' counts are generated for each column.\n If 1 or 'columns' counts are generated for each row.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n For each column/row the number of non-NA/null entries.\n\n See Also\n --------\n Series.count: Number of non-NA elements in a Series.\n DataFrame.value_counts: Count unique combinations of columns.\n DataFrame.shape: Number of DataFrame rows and columns (including NA\n elements).\n DataFrame.isna: Boolean same-sized DataFrame showing places of NA\n elements.\n\n Examples\n --------\n Constructing DataFrame from a dictionary:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Person\": [\"John\", \"Myla\", \"Lewis\", \"John\", \"Myla\"],\n ... \"Age\": [24.0, np.nan, 21.0, 33, 26],\n ... \"Single\": [False, True, True, True, False],\n ... }\n ... )\n >>> df\n Person Age Single\n 0 John 24.0 False\n 1 Myla NaN True\n 2 Lewis 21.0 True\n 3 John 33.0 True\n 4 Myla 26.0 False\n\n Notice the uncounted NA values:\n\n >>> df.count()\n Person 5\n Age 4\n Single 5\n dtype: int64\n\n Counts for each **row**:\n\n >>> df.count(axis=\"columns\")\n 0 3\n 1 2\n 2 3\n 3 3\n 4 3\n dtype: int64\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if numeric_only:\n frame = self._get_numeric_data()\n else:\n frame = self\n\n # GH #423\n if len(frame._get_axis(axis)) == 0:\n result = self._constructor_sliced(0, index=frame._get_agg_axis(axis))\n else:\n result = notna(frame).sum(axis=axis)\n\n return result.astype(\"int64\").__finalize__(self, method=\"count\")\n\n def _reduce(\n self,\n op,\n name: str,\n *,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n filter_type=None,\n **kwds,\n ):\n assert filter_type is None or filter_type == \"bool\", filter_type\n out_dtype = \"bool\" if filter_type == \"bool\" else None\n\n if axis is not None:\n axis = self._get_axis_number(axis)\n\n def func(values: np.ndarray):\n # We only use this in the case that operates on self.values\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def blk_func(values, axis: Axis = 1):\n if isinstance(values, ExtensionArray):\n if not is_1d_only_ea_dtype(values.dtype):\n return values._reduce(name, axis=1, skipna=skipna, **kwds)\n return values._reduce(name, skipna=skipna, keepdims=True, **kwds)\n else:\n return op(values, axis=axis, skipna=skipna, **kwds)\n\n def _get_data() -> DataFrame:\n if filter_type is None:\n data = self._get_numeric_data()\n else:\n # GH#25101, GH#24434\n assert filter_type == \"bool\"\n data = self._get_bool_data()\n return data\n\n # Case with EAs see GH#35881\n df = self\n if numeric_only:\n df = _get_data()\n if axis is None:\n dtype = find_common_type([block.values.dtype for block in df._mgr.blocks])\n if isinstance(dtype, ExtensionDtype):\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n return arr._reduce(name, skipna=skipna, keepdims=False, **kwds)\n return maybe_unbox_numpy_scalar(func(df.values))\n elif axis == 1:\n if len(df.index) == 0:\n # Taking a transpose would result in no columns, losing the dtype.\n # In the empty case, reducing along axis 0 or 1 gives the same\n # result dtype, so reduce with axis=0 and ignore values\n result = df._reduce(\n op,\n name,\n axis=0,\n skipna=skipna,\n numeric_only=False,\n filter_type=filter_type,\n **kwds,\n ).iloc[:0]\n result.index = df.index\n return result\n\n if df.shape[1]:\n # GH#51474: block-wise axis=1 reduction avoiding expensive\n # transpose for numpy-backed and 2D EA blocks.\n if (\n name in (\"sum\", \"prod\", \"min\", \"max\", \"any\", \"all\", \"mean\")\n and len(df._mgr.blocks) > 1\n and all(\n (isinstance(bv, np.ndarray) and bv.dtype.kind != \"O\")\n or (\n isinstance(bv, ExtensionArray)\n and bv.ndim == 2\n and name in (\"min\", \"max\")\n and skipna\n )\n for bv in (block.values for block in df._mgr.blocks)\n )\n ):\n return df._reduce_axis1(\n name,\n op,\n skipna=skipna,\n min_count=kwds.get(\"min_count\", 0),\n )\n dtype = find_common_type(\n [block.values.dtype for block in df._mgr.blocks]\n )\n if isinstance(dtype, ExtensionDtype):\n # GH 54341: fastpath for EA-backed axis=1 reductions\n # This flattens the frame into a single 1D array while keeping\n # track of the row and column indices of the original frame. Once\n # flattened, grouping by the row indices and aggregating should\n # be equivalent to transposing the original frame and aggregating\n # with axis=0.\n name = {\"argmax\": \"idxmax\", \"argmin\": \"idxmin\"}.get(name, name)\n df = df.astype(dtype)\n arr = concat_compat(list(df._iter_column_arrays()))\n nrows, ncols = df.shape\n row_index = np.tile(np.arange(nrows), ncols)\n col_index = np.repeat(np.arange(ncols), nrows)\n ser = Series(arr, index=col_index, copy=False)\n if name == \"all\":\n # Behavior here appears incorrect; preserving\n # for backwards compatibility for now.\n # See https://github.com/pandas-dev/pandas/issues/57171\n skipna = True\n result = ser.groupby(row_index).agg(name, **kwds, skipna=skipna)\n result.index = df.index\n return result\n\n df = df.T\n\n # After possibly _get_data and transposing, we are now in the\n # simple case where we can use BlockManager.reduce\n res = df._mgr.reduce(blk_func)\n out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]\n out.name = None\n if out_dtype is not None and out.dtype != \"boolean\":\n out = out.astype(out_dtype)\n elif (df._mgr.get_dtypes() == object).any() and name not in [\"any\", \"all\"]:\n out = out.astype(object)\n\n return out\n\n def _reduce_axis1(\n self, name: str, func, skipna: bool, min_count: int = 0\n ) -> Series:\n \"\"\"\n Special case for _reduce to try to avoid a potentially-expensive transpose.\n\n Apply the reduction block-wise along axis=1 and then reduce the resulting\n 1D arrays.\n \"\"\"\n if name == \"all\":\n result = np.ones(len(self), dtype=bool)\n ufunc = np.logical_and\n elif name == \"any\":\n result = np.zeros(len(self), dtype=bool)\n # error: Incompatible types in assignment\n # (expression has type \"_UFunc_Nin2_Nout1[Literal['logical_or'],\n # Literal[20], Literal[False]]\", variable has type\n # \"_UFunc_Nin2_Nout1[Literal['logical_and'], Literal[20],\n # Literal[True]]\")\n ufunc = np.logical_or # type: ignore[assignment]\n elif name in (\"sum\", \"mean\"):\n result = None\n ufunc = np.add # type: ignore[assignment]\n elif name == \"prod\":\n result = None\n ufunc = np.multiply # type: ignore[assignment]\n elif name == \"min\":\n result = None\n ufunc = np.fmin if skipna else np.minimum # type: ignore[assignment]\n elif name == \"max\":\n result = None\n ufunc = np.fmax if skipna else np.maximum # type: ignore[assignment]\n else:\n raise NotImplementedError(name)\n\n for block in self._mgr.blocks:\n vals = block.values\n if name in (\"min\", \"max\"):\n middle = ufunc.reduce(vals, axis=0) # type: ignore[arg-type]\n elif name == \"mean\":\n middle = nanops.nansum(vals, axis=0, skipna=skipna, min_count=0) # type: ignore[arg-type]\n elif name in (\"sum\", \"prod\"):\n # min_count=0 here so each block produces a result;\n # the actual min_count threshold is applied across\n # all blocks after the loop.\n middle = func(vals, axis=0, skipna=skipna, min_count=0)\n else:\n middle = func(vals, axis=0, skipna=skipna)\n if result is None:\n result = middle.copy()\n else:\n result = ufunc(result, middle)\n\n # Handle min_count for sum/prod, and compute mean from sum/count\n if name in (\"sum\", \"prod\", \"mean\"):\n if (min_count > 0 or name == \"mean\") and result is not None:\n non_null_count = np.zeros(len(self), dtype=np.intp)\n for block in self._mgr.blocks:\n vals = block.values\n if vals.dtype.kind in \"biu\":\n # bool/int/uint cannot have NaN\n non_null_count += vals.shape[0]\n else:\n non_null_count += vals.shape[0] - isna(vals).sum(axis=0)\n if name == \"mean\":\n null_mask = non_null_count == 0\n result = result.astype(\"float64\")\n result[~null_mask] /= non_null_count[~null_mask]\n result[null_mask] = np.nan\n else:\n null_mask = non_null_count < min_count\n if null_mask.any():\n if result.dtype.kind not in \"fc\":\n result = result.astype(\"float64\")\n result[null_mask] = np.nan\n\n assert result is not None\n res_ser = self._constructor_sliced(result, index=self.index, copy=False)\n return res_ser\n\n # error: Signature of \"any\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def any(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def any(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def any(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n def any(\n self,\n *,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether any element is True, potentially over an axis.\n\n Returns False unless there is at least one element within a series or\n along a Dataframe axis that is True or equivalent (e.g. non-zero or\n non-empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be False, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n numpy.any : Numpy version of this method.\n Series.any : Return whether any element is True.\n Series.all : Return whether all elements are True.\n DataFrame.any : Return whether any element is True over requested axis.\n DataFrame.all : Return whether all elements are True over requested axis.\n\n Examples\n --------\n **Series**\n\n For Series input, the output is a scalar indicating whether any element\n is True.\n\n >>> pd.Series([False, False]).any()\n False\n >>> pd.Series([True, False]).any()\n True\n >>> pd.Series([], dtype=\"float64\").any()\n False\n >>> pd.Series([np.nan]).any()\n False\n >>> pd.Series([np.nan]).any(skipna=False)\n True\n\n **DataFrame**\n\n Whether each column contains at least one True element (the default).\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0, 2], \"C\": [0, 0]})\n >>> df\n A B C\n 0 1 0 0\n 1 2 2 0\n\n >>> df.any()\n A True\n B True\n C False\n dtype: bool\n\n Aggregating over the columns.\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 2]})\n >>> df\n A B\n 0 True 1\n 1 False 2\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 True\n dtype: bool\n\n >>> df = pd.DataFrame({\"A\": [True, False], \"B\": [1, 0]})\n >>> df\n A B\n 0 True 1\n 1 False 0\n\n >>> df.any(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Aggregating over the entire DataFrame with ``axis=None``.\n\n >>> df.any(axis=None)\n True\n\n `any` for an empty DataFrame is an empty Series.\n\n >>> pd.DataFrame([]).any()\n Series([], dtype: bool)\n \"\"\"\n result = self._logical_func(\n \"any\", nanops.nanany, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"any\")\n return result\n\n @overload\n def all(\n self,\n *,\n axis: Axis = ...,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def all(\n self,\n *,\n axis: None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> bool: ...\n\n @overload\n def all(\n self,\n *,\n axis: Axis | None,\n bool_only: bool = ...,\n skipna: bool = ...,\n **kwargs,\n ) -> Series | bool: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"all\")\n def all(\n self,\n axis: Axis | None = 0,\n bool_only: bool = False,\n skipna: bool = True,\n **kwargs,\n ) -> Series | bool:\n \"\"\"\n Return whether all elements are True, potentially over an axis.\n\n Returns True unless there at least one element within a series or\n along a Dataframe axis that is False or equivalent (e.g. zero or\n empty).\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns', None}, default 0\n Indicate which axis or axes should be reduced. For `Series` this parameter\n is unused and defaults to 0.\n\n * 0 / 'index' : reduce the index, return a Series whose index is the\n original column labels.\n * 1 / 'columns' : reduce the columns, return a Series whose index is the\n original index.\n * None : reduce all axes, return a scalar.\n\n bool_only : bool, default False\n Include only boolean columns. Not implemented for Series.\n skipna : bool, default True\n Exclude NA/null values. If the entire row/column is NA and skipna is\n True, then the result will be True, as for an empty row/column.\n If skipna is False, NA values are treated as True for NumPy-backed\n dtypes (since they are not equal to zero). For nullable dtypes such\n as ``boolean``, NA values propagate following\n :ref:`Kleene logic `.\n **kwargs : any, default None\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or scalar\n If axis=None, then a scalar boolean is returned.\n Otherwise a Series is returned with index matching the index argument.\n\n See Also\n --------\n Series.all : Return True if all elements are True.\n DataFrame.any : Return True if one (or more) elements are True.\n\n Examples\n --------\n **Series**\n\n >>> pd.Series([True, True]).all()\n True\n >>> pd.Series([True, False]).all()\n False\n >>> pd.Series([], dtype=\"float64\").all()\n True\n >>> pd.Series([np.nan]).all()\n True\n >>> pd.Series([np.nan]).all(skipna=False)\n True\n\n **DataFrames**\n\n Create a DataFrame from a dictionary.\n\n >>> df = pd.DataFrame({\"col1\": [True, True], \"col2\": [True, False]})\n >>> df\n col1 col2\n 0 True True\n 1 True False\n\n Default behaviour checks if values in each column all return True.\n\n >>> df.all()\n col1 True\n col2 False\n dtype: bool\n\n Specify ``axis='columns'`` to check if values in each row all return True.\n\n >>> df.all(axis=\"columns\")\n 0 True\n 1 False\n dtype: bool\n\n Or ``axis=None`` for whether every value is True.\n\n >>> df.all(axis=None)\n False\n \"\"\"\n result = self._logical_func(\n \"all\", nanops.nanall, axis, bool_only, skipna, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"all\")\n return result\n\n # error: Signature of \"min\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def min(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def min(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def min(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"min\")\n def min(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the minimum of the values over the requested axis.\n\n If you want the *index* of the minimum, use ``idxmin``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmin``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.min()\n 0\n \"\"\"\n result = super().min(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"min\")\n return result\n\n # error: Signature of \"max\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def max(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def max(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def max(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"max\")\n def max(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the maximum of the values over the requested axis.\n\n If you want the *index* of the maximum, use ``idxmax``.\n This is the equivalent of the ``numpy.ndarray`` method ``argmax``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.max()\n 8\n \"\"\"\n result = super().max(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"max\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sum\")\n def sum(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the sum of the values over the requested axis.\n\n This is equivalent to the method ``numpy.sum``.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sum with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Sum over requested axis.\n\n See Also\n --------\n Series.sum : Return the sum over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.std : Return the standard deviation of the values over the\n requested axis.\n\n Examples\n --------\n >>> idx = pd.MultiIndex.from_arrays(\n ... [[\"warm\", \"warm\", \"cold\", \"cold\"], [\"dog\", \"falcon\", \"fish\", \"spider\"]],\n ... names=[\"blooded\", \"animal\"],\n ... )\n >>> s = pd.Series([4, 2, 0, 8], name=\"legs\", index=idx)\n >>> s\n blooded animal\n warm dog 4\n falcon 2\n cold fish 0\n spider 8\n Name: legs, dtype: int64\n\n >>> s.sum()\n 14\n\n By default, the sum of an empty or all-NA Series is ``0``.\n\n >>> pd.Series([], dtype=\"float64\").sum() # min_count=0 is the default\n 0.0\n\n This can be controlled with the ``min_count`` parameter. For example, if\n you'd like the sum of an empty series to be NaN, pass ``min_count=1``.\n\n >>> pd.Series([], dtype=\"float64\").sum(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).sum()\n 0.0\n\n >>> pd.Series([np.nan]).sum(min_count=1)\n nan\n \"\"\"\n result = super().sum(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sum\")\n return result\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"prod\")\n def prod(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n min_count: int = 0,\n **kwargs,\n ) -> Series:\n \"\"\"\n Return the product of the values over the requested axis.\n\n This multiplies all values in each column (or row when\n ``axis=1``) together, skipping missing values by default.\n An empty or all-NA column returns ``1`` unless ``min_count``\n is specified.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.prod with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n\n min_count : int, default 0\n The required number of valid values to perform the operation. If fewer than\n ``min_count`` non-NA values are present the result will be NA.\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n The product of the values over the requested axis.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n By default, the product of an empty or all-NA Series is ``1``\n\n >>> pd.Series([], dtype=\"float64\").prod()\n 1.0\n\n This can be controlled with the ``min_count`` parameter\n\n >>> pd.Series([], dtype=\"float64\").prod(min_count=1)\n nan\n\n Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and\n empty series identically.\n\n >>> pd.Series([np.nan]).prod()\n 1.0\n\n >>> pd.Series([np.nan]).prod(min_count=1)\n nan\n \"\"\"\n result = super().prod(\n axis=axis,\n skipna=skipna,\n numeric_only=numeric_only,\n min_count=min_count,\n **kwargs,\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"prod\")\n return result\n\n # error: Signature of \"mean\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def mean(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def mean(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def mean(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"mean\")\n def mean(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the mean of the values over the requested axis.\n\n This computes the arithmetic mean of the values in each column\n (or row when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.mean()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.mean()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.mean(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.mean(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().mean(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"mean\")\n return result\n\n # error: Signature of \"median\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def median(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def median(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def median(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\"], name=\"median\"\n )\n def median(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return the median of the values over the requested axis.\n\n This computes the median of the values in each column (or row\n when ``axis=1``), skipping missing values by default.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Value containing the calculation referenced in the description.\n\n See Also\n --------\n Series.sum : Return the sum.\n Series.min : Return the minimum.\n Series.max : Return the maximum.\n Series.idxmin : Return the index of the minimum.\n Series.idxmax : Return the index of the maximum.\n DataFrame.sum : Return the sum over the requested axis.\n DataFrame.min : Return the minimum over the requested axis.\n DataFrame.max : Return the maximum over the requested axis.\n DataFrame.idxmin : Return the index of the minimum over the requested axis.\n DataFrame.idxmax : Return the index of the maximum over the requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.median()\n 2.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.median()\n a 1.5\n b 2.5\n dtype: float64\n\n Using axis=1\n\n >>> df.median(axis=1)\n tiger 1.5\n zebra 2.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.median(numeric_only=True)\n a 1.5\n dtype: float64\n \"\"\"\n result = super().median(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"median\")\n return result\n\n # error: Signature of \"sem\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sem(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def sem(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def sem(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"sem\")\n def sem(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased standard error of the mean over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.sem with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series\n Unbiased standard error of the mean over requested axis.\n\n See Also\n --------\n DataFrame.var : Return unbiased variance over requested axis.\n DataFrame.std : Returns sample standard deviation over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> round(s.sem(), 6)\n 0.57735\n\n With a DataFrame\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [2, 3]}, index=[\"tiger\", \"zebra\"])\n >>> df\n a b\n tiger 1 2\n zebra 2 3\n >>> df.sem()\n a 0.5\n b 0.5\n dtype: float64\n\n Using axis=1\n\n >>> df.sem(axis=1)\n tiger 0.5\n zebra 0.5\n dtype: float64\n\n In this case, `numeric_only` should be set to `True`\n to avoid getting an error.\n\n >>> df = pd.DataFrame({\"a\": [1, 2], \"b\": [\"T\", \"Z\"]}, index=[\"tiger\", \"zebra\"])\n >>> df.sem(numeric_only=True)\n a 0.5\n dtype: float64\n \"\"\"\n result = super().sem(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"sem\")\n return result\n\n # error: Signature of \"var\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def var(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def var(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def var(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"var\")\n def var(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased variance over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.var with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs :\n Additional keywords passed.\n\n Returns\n -------\n Series or scalaer\n Unbiased variance over requested axis.\n\n See Also\n --------\n numpy.var : Equivalent function in NumPy.\n Series.var : Return unbiased variance over Series values.\n Series.std : Return standard deviation over Series values.\n DataFrame.std : Return standard deviation of the values over\n the requested axis.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n >>> df.var()\n age 352.916667\n height 0.056367\n dtype: float64\n\n Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:\n\n >>> df.var(ddof=0)\n age 264.687500\n height 0.042275\n dtype: float64\n \"\"\"\n result = super().var(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"var\")\n return result\n\n # error: Signature of \"std\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def std(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def std(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def std(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n ddof: int = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"std\")\n def std(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n ddof: int = 1,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return sample standard deviation over requested axis.\n\n Normalized by N-1 by default. This can be changed using the ddof argument.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n .. warning::\n\n The behavior of DataFrame.std with ``axis=None`` is deprecated,\n in a future version this will reduce over both axes and return a scalar\n To retain the old behavior, pass axis=0 (or do not pass axis).\n\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n ddof : int, default 1\n Delta Degrees of Freedom. The divisor used in calculations is N - ddof,\n where N represents the number of elements.\n numeric_only : bool, default False\n Include only float, int, boolean columns. Not implemented for Series.\n **kwargs : dict\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Standard deviation over requested axis.\n\n See Also\n --------\n Series.std : Return standard deviation over Series values.\n DataFrame.mean : Return the mean of the values over the requested axis.\n DataFrame.median : Return the median of the values over the requested axis.\n DataFrame.mode : Get the mode(s) of each element along the requested axis.\n DataFrame.sum : Return the sum of the values over the requested axis.\n\n Notes\n -----\n To have the same behaviour as ``numpy.std``, use ``ddof=0`` (instead of\n the default ``ddof=1``) and ``skipna=False``.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"person_id\": [0, 1, 2, 3],\n ... \"age\": [21, 25, 62, 43],\n ... \"height\": [1.61, 1.87, 1.49, 2.01],\n ... }\n ... ).set_index(\"person_id\")\n >>> df\n age height\n person_id\n 0 21 1.61\n 1 25 1.87\n 2 62 1.49\n 3 43 2.01\n\n The standard deviation of the columns can be found as follows:\n\n >>> df.std()\n age 18.786076\n height 0.237417\n dtype: float64\n\n Alternatively, `ddof=0` can be set to normalize by N instead of N-1:\n\n >>> df.std(ddof=0)\n age 16.269219\n height 0.205609\n dtype: float64\n \"\"\"\n result = super().std(\n axis=axis, skipna=skipna, ddof=ddof, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"std\")\n return result\n\n # error: Signature of \"skew\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def skew(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def skew(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def skew(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"skew\")\n def skew(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased skew over requested axis.\n\n Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased skew over requested axis.\n\n See Also\n --------\n DataFrame.kurt : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 3])\n >>> s.skew()\n 0.0\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [2, 3, 4], \"c\": [1, 3, 5]},\n ... index=[\"tiger\", \"zebra\", \"cow\"],\n ... )\n >>> df\n a b c\n tiger 1 2 1\n zebra 2 3 3\n cow 3 4 5\n >>> df.skew()\n a 0.0\n b 0.0\n c 0.0\n dtype: float64\n\n Using axis=1\n\n >>> df.skew(axis=1)\n tiger 1.732051\n zebra -1.732051\n cow 0.000000\n dtype: float64\n\n In this case, `numeric_only` should be set to `True` to avoid\n getting an error.\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 3], \"b\": [\"T\", \"Z\", \"X\"]}, index=[\"tiger\", \"zebra\", \"cow\"]\n ... )\n >>> df.skew(numeric_only=True)\n a 0.0\n dtype: float64\n \"\"\"\n result = super().skew(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"skew\")\n return result\n\n # error: Signature of \"kurt\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def kurt(\n self,\n *,\n axis: Axis = ...,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Any: ...\n\n @overload\n def kurt(\n self,\n *,\n axis: Axis | None,\n skipna: bool = ...,\n numeric_only: bool = ...,\n **kwargs,\n ) -> Series | Any: ...\n\n @deprecate_nonkeyword_arguments(Pandas4Warning, allowed_args=[\"self\"], name=\"kurt\")\n def kurt(\n self,\n axis: Axis | None = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n **kwargs,\n ) -> Series | Any:\n \"\"\"\n Return unbiased kurtosis over requested axis.\n\n Kurtosis obtained using Fisher's definition of\n kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n\n Parameters\n ----------\n axis : {index (0), columns (1)}, default 0\n Axis for the function to be applied on.\n For `Series` this parameter is unused and defaults to 0.\n\n For DataFrames, specifying ``axis=None`` will apply the aggregation\n across both axes.\n\n .. versionadded:: 2.0.0\n\n skipna : bool, default True\n Exclude NA/null values when computing the result.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n\n **kwargs\n Additional keyword arguments to be passed to the function.\n\n Returns\n -------\n Series or scalar\n Unbiased kurtosis over requested axis.\n\n See Also\n --------\n DataFrame.kurtosis : Returns unbiased kurtosis over requested axis.\n\n Examples\n --------\n >>> s = pd.Series([1, 2, 2, 3], index=[\"cat\", \"dog\", \"dog\", \"mouse\"])\n >>> s\n cat 1\n dog 2\n dog 2\n mouse 3\n dtype: int64\n >>> round(s.kurt(), 6)\n 1.5\n\n With a DataFrame\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2, 2, 3], \"b\": [3, 4, 4, 4]},\n ... index=[\"cat\", \"dog\", \"dog\", \"mouse\"],\n ... )\n >>> df\n a b\n cat 1 3\n dog 2 4\n dog 2 4\n mouse 3 4\n >>> round(df.kurt(), 6)\n a 1.5\n b 4.0\n dtype: float64\n\n With axis=None\n\n >>> round(df.kurt(axis=None), 6)\n -0.988693\n\n Using axis=1\n\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2], \"b\": [3, 4], \"c\": [3, 4], \"d\": [1, 2]},\n ... index=[\"cat\", \"dog\"],\n ... )\n >>> df.kurt(axis=1)\n cat -6.0\n dog -6.0\n dtype: float64\n \"\"\"\n result = super().kurt(\n axis=axis, skipna=skipna, numeric_only=numeric_only, **kwargs\n )\n if isinstance(result, Series):\n result = result.__finalize__(self, method=\"kurt\")\n return result\n\n # error: Incompatible types in assignment\n kurtosis = kurt # type: ignore[assignment]\n product = prod\n\n def cummin(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative minimum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n minimum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative minimum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.min : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.min : Return the minimum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummin()\n 0 2.0\n 1 NaN\n 2 2.0\n 3 -1.0\n 4 -1.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummin(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the minimum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummin()\n A B\n 0 2.0 1.0\n 1 2.0 NaN\n 2 1.0 0.0\n\n To iterate over columns and find the minimum in each row,\n use ``axis=1``\n\n >>> df.cummin(axis=1)\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummin(data, axis, skipna, *args, **kwargs)\n\n def cummax(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative maximum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n maximum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative maximum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.max : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.max : Return the maximum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cummax()\n 0 2.0\n 1 NaN\n 2 5.0\n 3 5.0\n 4 5.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cummax(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the maximum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cummax()\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 3.0 1.0\n\n To iterate over columns and find the maximum in each row,\n use ``axis=1``\n\n >>> df.cummax(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cummax(data, axis, skipna, *args, **kwargs)\n\n def cumsum(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative sum over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n sum.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative sum of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.sum : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.sum : Return the sum over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumsum()\n 0 2.0\n 1 NaN\n 2 7.0\n 3 6.0\n 4 6.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumsum(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the sum\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumsum()\n A B\n 0 2.0 1.0\n 1 5.0 NaN\n 2 6.0 1.0\n\n To iterate over columns and find the sum in each row,\n use ``axis=1``\n\n >>> df.cumsum(axis=1)\n A B\n 0 2.0 3.0\n 1 3.0 NaN\n 2 1.0 1.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumsum(data, axis, skipna, *args, **kwargs)\n\n def cumprod(\n self,\n axis: Axis = 0,\n skipna: bool = True,\n numeric_only: bool = False,\n *args,\n **kwargs,\n ) -> Self:\n \"\"\"\n Return cumulative product over a DataFrame or Series axis.\n\n Returns a DataFrame or Series of the same size containing the cumulative\n product.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The index or the name of the axis. 0 is equivalent to None or 'index'.\n For `Series` this parameter is unused and defaults to 0.\n skipna : bool, default True\n Exclude NA/null values. If an entire row/column is NA, the result\n will be NA.\n numeric_only : bool, default False\n Include only float, int, boolean columns.\n *args, **kwargs\n Additional keywords have no effect but might be accepted for\n compatibility with NumPy.\n\n Returns\n -------\n Series or DataFrame\n Return cumulative product of Series or DataFrame.\n\n See Also\n --------\n core.window.expanding.Expanding.prod : Similar functionality\n but ignores ``NaN`` values.\n DataFrame.prod : Return the product over\n DataFrame axis.\n DataFrame.cummax : Return cumulative maximum over DataFrame axis.\n DataFrame.cummin : Return cumulative minimum over DataFrame axis.\n DataFrame.cumsum : Return cumulative sum over DataFrame axis.\n DataFrame.cumprod : Return cumulative product over DataFrame axis.\n\n Examples\n --------\n **Series**\n\n >>> s = pd.Series([2, np.nan, 5, -1, 0])\n >>> s\n 0 2.0\n 1 NaN\n 2 5.0\n 3 -1.0\n 4 0.0\n dtype: float64\n\n By default, NA values are ignored.\n\n >>> s.cumprod()\n 0 2.0\n 1 NaN\n 2 10.0\n 3 -10.0\n 4 -0.0\n dtype: float64\n\n To include NA values in the operation, use ``skipna=False``\n\n >>> s.cumprod(skipna=False)\n 0 2.0\n 1 NaN\n 2 NaN\n 3 NaN\n 4 NaN\n dtype: float64\n\n **DataFrame**\n\n >>> df = pd.DataFrame(\n ... [[2.0, 1.0], [3.0, np.nan], [1.0, 0.0]], columns=list(\"AB\")\n ... )\n >>> df\n A B\n 0 2.0 1.0\n 1 3.0 NaN\n 2 1.0 0.0\n\n By default, iterates over rows and finds the product\n in each column. This is equivalent to ``axis=None`` or ``axis='index'``.\n\n >>> df.cumprod()\n A B\n 0 2.0 1.0\n 1 6.0 NaN\n 2 6.0 0.0\n\n To iterate over columns and find the product in each row,\n use ``axis=1``\n\n >>> df.cumprod(axis=1)\n A B\n 0 2.0 2.0\n 1 3.0 NaN\n 2 1.0 0.0\n \"\"\"\n data = self._get_numeric_data() if numeric_only else self\n return NDFrame.cumprod(data, axis, skipna, *args, **kwargs)\n\n def nunique(self, axis: Axis = 0, dropna: bool = True) -> Series:\n \"\"\"\n Count number of distinct elements in specified axis.\n\n Return Series with number of distinct elements. Can ignore NaN\n values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for\n column-wise.\n dropna : bool, default True\n Don't include NaN in the counts.\n\n Returns\n -------\n Series\n Series with counts of unique values per row or column, depending on `axis`.\n\n See Also\n --------\n Series.nunique: Method nunique for Series.\n DataFrame.count: Count non-NA cells for each column or row.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [4, 5, 6], \"B\": [4, 1, 1]})\n >>> df.nunique()\n A 3\n B 2\n dtype: int64\n\n >>> df.nunique(axis=1)\n 0 1\n 1 2\n 2 2\n dtype: int64\n \"\"\"\n return self.apply(Series.nunique, axis=axis, dropna=dropna)\n\n def idxmin(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of minimum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of minima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmin : Return index of the minimum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmin``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the minimum value in each column.\n\n >>> df.idxmin()\n consumption Pork\n co2_emissions Wheat Products\n dtype: str\n\n To return the index for the minimum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmin(axis=\"columns\")\n Pork consumption\n Wheat Products co2_emissions\n Beef consumption\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmin, \"argmin\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be np.ndarray since axis is not N\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmin\")\n\n def idxmax(\n self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False\n ) -> Series:\n \"\"\"\n Return index of first occurrence of maximum over requested axis.\n\n NA/null values are excluded.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n skipna : bool, default True\n Exclude NA/null values. If the entire DataFrame is NA,\n or if ``skipna=False`` and there is an NA value, this method\n will raise a ``ValueError``.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n Returns\n -------\n Series\n Indexes of maxima along the specified axis.\n\n Raises\n ------\n ValueError\n * If the row/column is empty\n\n See Also\n --------\n Series.idxmax : Return index of the maximum element.\n\n Notes\n -----\n This method is the DataFrame version of ``ndarray.argmax``.\n\n Examples\n --------\n Consider a dataset containing food consumption in Argentina.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"consumption\": [10.51, 103.11, 55.48],\n ... \"co2_emissions\": [37.2, 19.66, 1712],\n ... },\n ... index=[\"Pork\", \"Wheat Products\", \"Beef\"],\n ... )\n\n >>> df\n consumption co2_emissions\n Pork 10.51 37.20\n Wheat Products 103.11 19.66\n Beef 55.48 1712.00\n\n By default, it returns the index for the maximum value in each column.\n\n >>> df.idxmax()\n consumption Wheat Products\n co2_emissions Beef\n dtype: str\n\n To return the index for the maximum value in each row, use ``axis=\"columns\"``.\n\n >>> df.idxmax(axis=\"columns\")\n Pork co2_emissions\n Wheat Products consumption\n Beef co2_emissions\n dtype: str\n \"\"\"\n axis = self._get_axis_number(axis)\n\n if self.empty and len(self.axes[axis]):\n axis_dtype = self.axes[axis].dtype\n return self._constructor_sliced(dtype=axis_dtype)\n\n if numeric_only:\n data = self._get_numeric_data()\n else:\n data = self\n\n res = data._reduce(\n nanops.nanargmax, \"argmax\", axis=axis, skipna=skipna, numeric_only=False\n )\n indices = res._values\n # indices will always be 1d array since axis is not None\n\n if (indices == -1).any():\n if skipna:\n msg = \"Encountered all NA values\"\n else:\n msg = \"Encountered an NA values with skipna=False\"\n raise ValueError(msg)\n\n index = data._get_axis(axis)\n result = index.take(indices, allow_fill=True)._values\n final_result = data._constructor_sliced(result, index=data._get_agg_axis(axis))\n return final_result.__finalize__(self, method=\"idxmax\")\n\n def _get_agg_axis(self, axis_num: int) -> Index:\n \"\"\"\n Let's be explicit about this.\n \"\"\"\n if axis_num == 0:\n return self.columns\n elif axis_num == 1:\n return self.index\n else:\n raise ValueError(f\"Axis must be 0 or 1 (got {axis_num!r})\")\n\n def mode(\n self, axis: Axis = 0, numeric_only: bool = False, dropna: bool = True\n ) -> DataFrame:\n \"\"\"\n Get the mode(s) of each element along the selected axis.\n\n The mode of a set of values is the value that appears most often.\n It can be multiple values.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to iterate over while searching for the mode:\n\n * 0 or 'index' : get mode of each column\n * 1 or 'columns' : get mode of each row.\n\n numeric_only : bool, default False\n If True, only apply to numeric columns.\n dropna : bool, default True\n Don't consider counts of NaN/NaT.\n\n Returns\n -------\n DataFrame\n The modes of each column or row.\n\n See Also\n --------\n Series.mode : Return the highest frequency value in a Series.\n Series.value_counts : Return the counts of values in a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"bird\", 2, 2),\n ... (\"mammal\", 4, np.nan),\n ... (\"arthropod\", 8, 0),\n ... (\"bird\", 2, np.nan),\n ... ],\n ... index=(\"falcon\", \"horse\", \"spider\", \"ostrich\"),\n ... columns=(\"species\", \"legs\", \"wings\"),\n ... )\n >>> df\n species legs wings\n falcon bird 2 2.0\n horse mammal 4 NaN\n spider arthropod 8 0.0\n ostrich bird 2 NaN\n\n By default, missing values are not considered, and the mode of wings\n are both 0 and 2. Because the resulting DataFrame has two rows,\n the second row of ``species`` and ``legs`` contains ``NaN``.\n\n >>> df.mode()\n species legs wings\n 0 bird 2.0 0.0\n 1 NaN NaN 2.0\n\n Setting ``dropna=False`` ``NaN`` values are considered and they can be\n the mode (like for wings).\n\n >>> df.mode(dropna=False)\n species legs wings\n 0 bird 2 NaN\n\n Setting ``numeric_only=True``, only the mode of numeric columns is\n computed, and columns of other types are ignored.\n\n >>> df.mode(numeric_only=True)\n legs wings\n 0 2.0 0.0\n 1 NaN 2.0\n\n To compute the mode over columns and not rows, use the axis parameter:\n\n >>> df.mode(axis=\"columns\", numeric_only=True)\n 0 1\n falcon 2.0 NaN\n horse 4.0 NaN\n spider 0.0 8.0\n ostrich 2.0 NaN\n \"\"\"\n data = self if not numeric_only else self._get_numeric_data()\n\n def f(s):\n return s.mode(dropna=dropna)\n\n data = data.apply(f, axis=axis)\n # Ensure index is type stable (should always use int index)\n if data.empty:\n data.index = default_index(0)\n\n return data\n\n @overload\n def quantile(\n self,\n q: float = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series: ...\n\n @overload\n def quantile(\n self,\n q: AnyArrayLike | Sequence[float],\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n @overload\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = ...,\n axis: Axis = ...,\n numeric_only: bool = ...,\n interpolation: QuantileInterpolation = ...,\n method: Literal[\"single\", \"table\"] = ...,\n ) -> Series | DataFrame: ...\n\n def quantile(\n self,\n q: float | AnyArrayLike | Sequence[float] = 0.5,\n axis: Axis = 0,\n numeric_only: bool = False,\n interpolation: QuantileInterpolation = \"linear\",\n method: Literal[\"single\", \"table\"] = \"single\",\n ) -> Series | DataFrame:\n \"\"\"\n Return values at the given quantile over requested axis.\n\n This method computes the value below which a given proportion of\n observations fall. By default, it computes quantiles column-wise,\n but row-wise computation is also supported via ``axis=1``.\n\n Parameters\n ----------\n q : float or array-like, default 0.5 (50% quantile)\n Value between 0 <= q <= 1, the quantile(s) to compute.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Equals 0 or 'index' for row-wise, 1 or 'columns' for column-wise.\n numeric_only : bool, default False\n Include only `float`, `int` or `boolean` data.\n\n .. versionchanged:: 2.0.0\n The default value of ``numeric_only`` is now ``False``.\n\n interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}\n This optional parameter specifies the interpolation method to use,\n when the desired quantile lies between two data points `i` and `j`:\n\n * linear: `i + (j - i) * fraction`, where `fraction` is the\n fractional part of the index surrounded by `i` and `j`.\n * lower: `i`.\n * higher: `j`.\n * nearest: `i` or `j` whichever is nearest.\n * midpoint: (`i` + `j`) / 2.\n method : {'single', 'table'}, default 'single'\n Whether to compute quantiles per-column ('single') or over all columns\n ('table'). When 'table', the only allowed interpolation methods are\n 'nearest', 'lower', and 'higher'.\n\n Returns\n -------\n Series or DataFrame\n\n If ``q`` is an array, a DataFrame will be returned where the\n index is ``q``, the columns are the columns of self, and the\n values are the quantiles.\n If ``q`` is a float, a Series will be returned where the\n index is the columns of self and the values are the quantiles.\n\n See Also\n --------\n core.window.rolling.Rolling.quantile: Rolling quantile.\n numpy.percentile: Numpy function to compute the percentile.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=[\"a\", \"b\"]\n ... )\n >>> df.quantile(0.1)\n a 1.3\n b 3.7\n Name: 0.1, dtype: float64\n >>> df.quantile([0.1, 0.5])\n a b\n 0.1 1.3 3.7\n 0.5 2.5 55.0\n\n Specifying `method='table'` will compute the quantile over all columns.\n\n >>> df.quantile(0.1, method=\"table\", interpolation=\"nearest\")\n a 1\n b 1\n Name: 0.1, dtype: int64\n >>> df.quantile([0.1, 0.5], method=\"table\", interpolation=\"nearest\")\n a b\n 0.1 1 1\n 0.5 3 100\n\n Specifying `numeric_only=False` will compute the quantiles for all\n columns.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"A\": [1, 2],\n ... \"B\": [pd.Timestamp(\"2010\"), pd.Timestamp(\"2011\")],\n ... \"C\": [pd.Timedelta(\"1 days\"), pd.Timedelta(\"2 days\")],\n ... }\n ... )\n >>> df.quantile(0.5, numeric_only=False)\n A 1.5\n B 2010-07-02 12:00:00\n C 1 days 12:00:00\n Name: 0.5, dtype: object\n \"\"\"\n validate_percentile(q)\n axis = self._get_axis_number(axis)\n\n if not is_list_like(q):\n # BlockManager.quantile expects listlike, so we wrap and unwrap here\n # error: List item 0 has incompatible type \"float | ExtensionArray |\n # ndarray[Any, Any] | Index | Series | Sequence[float]\"; expected \"float\"\n res_df = self.quantile(\n [q], # type: ignore[list-item]\n axis=axis,\n numeric_only=numeric_only,\n interpolation=interpolation,\n method=method,\n )\n if method == \"single\":\n res = res_df.iloc[0]\n else:\n # cannot directly iloc over sparse arrays\n res = res_df.T.iloc[:, 0]\n if axis == 1 and len(self) == 0:\n # GH#41544 try to get an appropriate dtype\n dtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(dtype):\n return res.astype(dtype)\n return res\n\n q = Index(q, dtype=np.float64)\n data = self._get_numeric_data() if numeric_only else self\n\n if axis == 1:\n data = data.T\n\n if len(data.columns) == 0:\n # GH#23925 _get_numeric_data may have dropped all columns\n cols = self.columns[:0]\n\n dtype = np.float64\n if axis == 1:\n # GH#41544 try to get an appropriate dtype\n cdtype = find_common_type(list(self.dtypes))\n if needs_i8_conversion(cdtype):\n dtype = cdtype\n\n res = self._constructor([], index=q, columns=cols, dtype=dtype)\n return res.__finalize__(self, method=\"quantile\")\n\n valid_method = {\"single\", \"table\"}\n if method not in valid_method:\n raise ValueError(\n f\"Invalid method: {method}. Method must be in {valid_method}.\"\n )\n if method == \"single\":\n res = data._mgr.quantile(qs=q, interpolation=interpolation)\n elif method == \"table\":\n valid_interpolation = {\"nearest\", \"lower\", \"higher\"}\n if interpolation not in valid_interpolation:\n raise ValueError(\n f\"Invalid interpolation: {interpolation}. \"\n f\"Interpolation must be in {valid_interpolation}\"\n )\n # handle degenerate case\n if len(data) == 0:\n if data.ndim == 2:\n dtype = find_common_type(list(self.dtypes))\n else:\n dtype = self.dtype\n return self._constructor([], index=q, columns=data.columns, dtype=dtype)\n\n q_idx = np.quantile(np.arange(len(data)), q, method=interpolation)\n\n by = data.columns\n if len(by) > 1:\n keys = [data._get_label_or_level_values(x) for x in by]\n indexer = lexsort_indexer(keys)\n else:\n k = data._get_label_or_level_values(by[0])\n indexer = nargsort(k)\n\n res = data._mgr.take(indexer[q_idx], verify=False)\n res.axes[1] = q\n\n result = self._constructor_from_mgr(res, axes=res.axes)\n return result.__finalize__(self, method=\"quantile\")\n\n def to_timestamp(\n self,\n freq: Frequency | None = None,\n how: ToTimestampHow = \"start\",\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Cast PeriodIndex to DatetimeIndex of timestamps, at *beginning* of period.\n\n This can be changed to the *end* of the period, by specifying `how=\"e\"`.\n\n Parameters\n ----------\n freq : str, default frequency of PeriodIndex\n Desired frequency.\n how : {'s', 'e', 'start', 'end'}\n Convention for converting period to timestamp; start of period\n vs. end.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame with DatetimeIndex\n DataFrame with the PeriodIndex cast to DatetimeIndex.\n\n See Also\n --------\n DataFrame.to_period: Inverse method to cast DatetimeIndex to PeriodIndex.\n Series.to_timestamp: Equivalent method for Series.\n\n Examples\n --------\n >>> idx = pd.PeriodIndex([\"2023\", \"2024\"], freq=\"Y\")\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d, index=idx)\n >>> df1\n col1 col2\n 2023 1 3\n 2024\t 2 4\n\n The resulting timestamps will be at the beginning of the year in this case\n\n >>> df1 = df1.to_timestamp()\n >>> df1\n col1 col2\n 2023-01-01 1 3\n 2024-01-01 2 4\n >>> df1.index\n DatetimeIndex(['2023-01-01', '2024-01-01'], dtype='datetime64[us]', freq=None)\n\n Using `freq` which is the offset that the Timestamps will have\n\n >>> df2 = pd.DataFrame(data=d, index=idx)\n >>> df2 = df2.to_timestamp(freq=\"M\")\n >>> df2\n col1 col2\n 2023-01-31 1 3\n 2024-01-31 2 4\n >>> df2.index\n DatetimeIndex(['2023-01-31', '2024-01-31'], dtype='datetime64[us]', freq=None)\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, PeriodIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_timestamp(freq=freq, how=how)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def to_period(\n self,\n freq: Frequency | None = None,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Convert DataFrame from DatetimeIndex to PeriodIndex.\n\n Convert DataFrame from DatetimeIndex to PeriodIndex with desired\n frequency (inferred from index if not passed). Either index of columns can be\n converted, depending on `axis` argument.\n\n Parameters\n ----------\n freq : str, default\n Frequency of the PeriodIndex.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to convert (the index by default).\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The DataFrame with the converted PeriodIndex.\n\n See Also\n --------\n Series.to_period: Equivalent method for Series.\n Series.dt.to_period: Convert DateTime column values.\n\n Examples\n --------\n >>> idx = pd.to_datetime(\n ... [\n ... \"2001-03-31 00:00:00\",\n ... \"2002-05-31 00:00:00\",\n ... \"2003-08-31 00:00:00\",\n ... ]\n ... )\n\n >>> idx\n DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],\n dtype='datetime64[us]', freq=None)\n\n >>> idx.to_period(\"M\")\n PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')\n\n For the yearly frequency\n\n >>> idx.to_period(\"Y\")\n PeriodIndex(['2001', '2002', '2003'], dtype='period[Y-DEC]')\n \"\"\"\n self._check_copy_deprecation(copy)\n new_obj = self.copy(deep=False)\n\n axis_name = self._get_axis_name(axis)\n old_ax = getattr(self, axis_name)\n if not isinstance(old_ax, DatetimeIndex):\n raise TypeError(f\"unsupported Type {type(old_ax).__name__}\")\n\n new_ax = old_ax.to_period(freq=freq)\n\n setattr(new_obj, axis_name, new_ax)\n return new_obj\n\n def isin(self, values: Series | DataFrame | Sequence | Mapping) -> DataFrame:\n \"\"\"\n Whether each element in the DataFrame is contained in values.\n\n Returns a DataFrame of the same shape with boolean values: True\n where the element is in the corresponding structure of\n ``values``, False otherwise. ``values`` can be a list, dict,\n Series, or DataFrame; alignment rules depend on its type.\n\n Parameters\n ----------\n values : iterable, Series, DataFrame or dict\n The result will only be true at a location if all the\n labels match. If `values` is a Series, that's the index. If\n `values` is a dict, the keys must be the column names,\n which must match. If `values` is a DataFrame,\n then both the index and column labels must match.\n\n Returns\n -------\n DataFrame\n DataFrame of booleans showing whether each element in the DataFrame\n is contained in values.\n\n See Also\n --------\n DataFrame.eq: Equality test for DataFrame.\n Series.isin: Equivalent method on Series.\n Series.str.contains: Test if pattern or regex is contained within a\n string of a Series or Index.\n\n Notes\n -----\n ``__iter__`` is used (and not ``__contains__``) to iterate over values\n when checking if it contains the elements in DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4], \"num_wings\": [2, 0]}, index=[\"falcon\", \"dog\"]\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n\n When ``values`` is a list check whether every value in the DataFrame\n is present in the list (which animals have 0 or 2 legs or wings)\n\n >>> df.isin([0, 2])\n num_legs num_wings\n falcon True True\n dog False True\n\n To check if ``values`` is *not* in the DataFrame, use the ``~`` operator:\n\n >>> ~df.isin([0, 2])\n num_legs num_wings\n falcon False False\n dog True False\n\n When ``values`` is a dict, we can pass values to check for each\n column separately:\n\n >>> df.isin({\"num_wings\": [0, 3]})\n num_legs num_wings\n falcon False False\n dog False True\n\n When ``values`` is a Series or DataFrame the index and column must\n match. Note that 'falcon' does not match based on the number of legs\n in other.\n\n >>> other = pd.DataFrame(\n ... {\"num_legs\": [8, 3], \"num_wings\": [0, 2]}, index=[\"spider\", \"falcon\"]\n ... )\n >>> df.isin(other)\n num_legs num_wings\n falcon False True\n dog False False\n \"\"\"\n if isinstance(values, dict):\n from pandas.core.reshape.concat import concat\n\n values = collections.defaultdict(list, values)\n result = concat(\n (\n self.iloc[:, [i]].isin(values[col])\n for i, col in enumerate(self.columns)\n ),\n axis=1,\n )\n elif isinstance(values, Series):\n if not values.index.is_unique:\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self), axis=\"index\")\n elif isinstance(values, DataFrame):\n if not (values.columns.is_unique and values.index.is_unique):\n raise ValueError(\"cannot compute isin with a duplicate axis.\")\n result = self.eq(values.reindex_like(self))\n else:\n if not is_list_like(values):\n raise TypeError(\n \"only list-like or dict-like objects are allowed \"\n \"to be passed to DataFrame.isin(), \"\n f\"you passed a '{type(values).__name__}'\"\n )\n\n def isin_(x):\n # error: Argument 2 to \"isin\" has incompatible type \"Union[Series,\n # DataFrame, Sequence[Any], Mapping[Any, Any]]\"; expected\n # \"Union[Union[Union[ExtensionArray, ndarray[Any, Any]], Index,\n # Series], List[Any], range]\"\n result = algorithms.isin(\n x.ravel(),\n values, # type: ignore[arg-type]\n )\n return result.reshape(x.shape)\n\n res_mgr = self._mgr.apply(isin_)\n result = self._constructor_from_mgr(\n res_mgr,\n axes=res_mgr.axes,\n )\n return result.__finalize__(self, method=\"isin\")\n\n # ----------------------------------------------------------------------\n # Add index and columns\n _AXIS_ORDERS: list[Literal[\"index\", \"columns\"]] = [\"index\", \"columns\"]\n _AXIS_TO_AXIS_NUMBER: dict[Axis, int] = {\n **NDFrame._AXIS_TO_AXIS_NUMBER,\n 1: 1,\n \"columns\": 1,\n }\n _AXIS_LEN = len(_AXIS_ORDERS)\n _info_axis_number: Literal[1] = 1\n _info_axis_name: Literal[\"columns\"] = \"columns\"\n\n index = properties.AxisProperty(\n axis=1,\n doc=\"\"\"\n The index (row labels) of the DataFrame.\n\n The index of a DataFrame is a series of labels that identify each row.\n The labels can be integers, strings, or any other hashable type. The index\n is used for label-based access and alignment, and can be accessed or\n modified using this attribute.\n\n Returns\n -------\n pandas.Index\n The index labels of the DataFrame.\n\n See Also\n --------\n DataFrame.columns : The column labels of the DataFrame.\n DataFrame.to_numpy : Convert the DataFrame to a NumPy array.\n\n Examples\n --------\n >>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],\n ... 'Age': [25, 30, 35],\n ... 'Location': ['Seattle', 'New York', 'Kona']},\n ... index=([10, 20, 30]))\n >>> df.index\n Index([10, 20, 30], dtype='int64')\n\n In this example, we create a DataFrame with 3 rows and 3 columns,\n including Name, Age, and Location information. We set the index labels to\n be the integers 10, 20, and 30. We then access the `index` attribute of the\n DataFrame, which returns an `Index` object containing the index labels.\n\n >>> df.index = [100, 200, 300]\n >>> df\n Name Age Location\n 100 Alice 25 Seattle\n 200 Bob 30 New York\n 300 Aritra 35 Kona\n\n In this example, we modify the index labels of the DataFrame by assigning\n a new list of labels to the `index` attribute. The DataFrame is then\n updated with the new labels, and the output shows the modified DataFrame.\n \"\"\",\n )\n columns = properties.AxisProperty(\n axis=0,\n doc=\"\"\"\n The column labels of the DataFrame.\n\n This property holds the column names as a pandas ``Index`` object.\n It provides an immutable sequence of column labels that can be\n used for data selection, renaming, and alignment in DataFrame operations.\n\n Returns\n -------\n pandas.Index\n The column labels of the DataFrame.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.axes: Return a list representing the axes of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})\n >>> df\n A B\n 0 1 3\n 1 2 4\n >>> df.columns\n Index(['A', 'B'], dtype='str')\n \"\"\",\n )\n\n # ----------------------------------------------------------------------\n # Add plotting methods to DataFrame\n plot = Accessor(\"plot\", pandas.plotting.PlotAccessor)\n hist = pandas.plotting.hist_frame\n boxplot = pandas.plotting.boxplot_frame\n sparse = Accessor(\"sparse\", SparseFrameAccessor)\n\n # ----------------------------------------------------------------------\n # Internal Interface Methods\n\n def _to_dict_of_blocks(self) -> dict[str, DataFrame]:\n \"\"\"\n Return a dict of dtype -> Constructor Types that\n each is a homogeneous dtype.\n\n Internal ONLY.\n \"\"\"\n mgr = self._mgr\n return {\n k: self._constructor_from_mgr(v, axes=v.axes).__finalize__(self)\n for k, v in mgr.to_iter_dict()\n }\n\n @property\n def values(self) -> np.ndarray:\n \"\"\"\n Return a Numpy representation of the DataFrame.\n\n .. warning::\n\n We recommend using :meth:`DataFrame.to_numpy` instead.\n ``.values`` offers no way to control the output ``dtype``, copy\n semantics, or the value used to fill missing entries, while\n :meth:`DataFrame.to_numpy` exposes those as the ``dtype``,\n ``copy``, and ``na_value`` arguments. The mutability of the\n result also depends on the DataFrame's internal block layout:\n when the DataFrame is backed by a single block the result is a\n read-only view (writes raise); when there are multiple blocks\n the result is a writable copy whose mutations do not propagate\n back to the DataFrame.\n\n Only the values in the DataFrame will be returned, the axes labels\n will be removed.\n\n Returns\n -------\n numpy.ndarray\n The values of the DataFrame.\n\n See Also\n --------\n DataFrame.to_numpy : Recommended alternative to this method.\n DataFrame.index : Retrieve the index labels.\n DataFrame.columns : Retrieving the column names.\n\n Notes\n -----\n The returned array is not intended to be written to. When the\n DataFrame is backed by a single NumPy array (single dtype, single\n block), the result is a read-only view; when the DataFrame has\n multiple internal blocks (e.g. after adding a new column), the\n result is a copy and modifications to it will not be reflected in\n the original DataFrame. Use :meth:`DataFrame.to_numpy` for more\n explicit control over copy behavior, or use :attr:`DataFrame.iloc`\n to modify values in-place.\n\n The dtype will be a lower-common-denominator dtype (implicit\n upcasting); that is to say if the dtypes (even of numeric types)\n are mixed, the one that accommodates all will be chosen. Use this\n with care if you are not dealing with the blocks.\n\n e.g. If the dtypes are float16 and float32, dtype will be upcast to\n float32. If dtypes are int32 and uint8, dtype will be upcast to\n int32. By :func:`numpy.find_common_type` convention, mixing int64\n and uint64 will result in a float64 dtype.\n\n Examples\n --------\n A DataFrame where all columns are the same type (e.g., int64) results\n in an array of the same type.\n\n >>> df = pd.DataFrame(\n ... {\"age\": [3, 29], \"height\": [94, 170], \"weight\": [31, 115]}\n ... )\n >>> df\n age height weight\n 0 3 94 31\n 1 29 170 115\n >>> df.dtypes\n age int64\n height int64\n weight int64\n dtype: object\n >>> df.values\n array([[ 3, 94, 31],\n [ 29, 170, 115]])\n\n A DataFrame with mixed type columns(e.g., str/object, int64, float32)\n results in an ndarray of the broadest type that accommodates these\n mixed types (e.g., object).\n\n >>> df2 = pd.DataFrame(\n ... [\n ... (\"parrot\", 24.0, \"second\"),\n ... (\"lion\", 80.5, 1),\n ... (\"monkey\", np.nan, None),\n ... ],\n ... columns=(\"name\", \"max_speed\", \"rank\"),\n ... )\n >>> df2.dtypes\n name str\n max_speed float64\n rank object\n dtype: object\n >>> df2.values\n array([['parrot', 24.0, 'second'],\n ['lion', 80.5, 1],\n ['monkey', nan, None]], dtype=object)\n\n ``DataFrame.to_numpy`` produces the same array by default, but lets\n you choose how missing values are represented and request a\n guaranteed copy:\n\n >>> df3 = pd.DataFrame({\"a\": [1, 2], \"b\": [3.0, np.nan]})\n >>> df3.values\n array([[ 1., 3.],\n [ 2., nan]])\n >>> df3.to_numpy(na_value=-1)\n array([[ 1., 3.],\n [ 2., -1.]])\n >>> df3.to_numpy(dtype=\"float32\", copy=True)\n array([[ 1., 3.],\n [ 2., nan]], dtype=float32)\n \"\"\"\n return self._mgr.as_array()\n\n\ndef _from_nested_dict(\n data: Mapping[HashableT, Mapping[HashableT2, T]],\n) -> collections.defaultdict[HashableT2, dict[HashableT, T]]:\n new_data: collections.defaultdict[HashableT2, dict[HashableT, T]] = (\n collections.defaultdict(dict)\n )\n for index, s in data.items():\n for col, v in s.items():\n new_data[col][index] = v\n return new_data\n\n\ndef _reindex_for_setitem(\n value: DataFrame | Series, index: Index\n) -> tuple[ArrayLike, BlockValuesRefs | None]:\n # reindex if necessary\n\n if value.index.equals(index) or not len(index):\n if isinstance(value, Series):\n return value._values, value._references\n return value._values.copy(), None\n\n # GH#4107\n try:\n reindexed_value = value.reindex(index)._values\n except ValueError as err:\n # raised in MultiIndex.from_tuples, see test_insert_error_msmgs\n if not value.index.is_unique:\n # duplicate axis\n raise err\n\n raise TypeError(\n \"incompatible index of inserted column with frame index\"\n ) from err\n```\n
","tags":["context-window","needle-retrieval","python","negative-control","256k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":256000,"needle_position":"absent","needle_count":0,"evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-needle-32k.jsonl b/backend/data/datasets/context-needle-32k.jsonl new file mode 100644 index 0000000..85e3140 --- /dev/null +++ b/backend/data/datasets/context-needle-32k.jsonl @@ -0,0 +1,5 @@ +{"id":"needle-front-32k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-front-32k\nApproximate target context: 32000 tokens; needle position: front.\nFind the Python benchmark needle for needle-front-32k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n# InferHarness context needle: needle-front-32k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_32K_FRONT\"\n# End InferHarness context needle\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n```\n
","tags":["context-window","needle-retrieval","python","front","32k"],"expected_answer":"IH_NEEDLE_32K_FRONT","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":32000,"needle_position":"front","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-middle-32k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-middle-32k\nApproximate target context: 32000 tokens; needle position: middle.\nFind the Python benchmark needle for needle-middle-32k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n# InferHarness context needle: needle-middle-32k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_32K_MIDDLE\"\n# End InferHarness context needle\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n```\n
","tags":["context-window","needle-retrieval","python","middle","32k"],"expected_answer":"IH_NEEDLE_32K_MIDDLE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":32000,"needle_position":"middle","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-late-32k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-late-32k\nApproximate target context: 32000 tokens; needle position: late_80_percent.\nFind the Python benchmark needle for needle-late-32k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n# InferHarness context needle: needle-late-32k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_32K_LATE\"\n# End InferHarness context needle\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n```\n
","tags":["context-window","needle-retrieval","python","late_80_percent","32k"],"expected_answer":"IH_NEEDLE_32K_LATE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":32000,"needle_position":"late_80_percent","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-two-facts-32k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-two-facts-32k\nApproximate target context: 32000 tokens; needle position: two_facts_20_and_80_percent.\nFind both Python benchmark needles for needle-two-facts-32k. Reply with the string assigned to BENCHMARK_NEEDLE_ALPHA, then a | separator, then the string assigned to BENCHMARK_NEEDLE_BETA, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n# InferHarness context needle: needle-two-facts-32k alpha\nBENCHMARK_NEEDLE_ALPHA = \"IH_NEEDLE_32K_ALPHA\"\n# End InferHarness context needle alpha\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n# InferHarness context needle: needle-two-facts-32k beta\nBENCHMARK_NEEDLE_BETA = \"IH_NEEDLE_32K_BETA\"\n# End InferHarness context needle beta\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n```\n
","tags":["context-window","needle-retrieval","python","two-facts","32k"],"expected_answer":"IH_NEEDLE_32K_ALPHA|IH_NEEDLE_32K_BETA","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":32000,"needle_position":"two_facts_20_and_80_percent","needle_count":2,"evaluation_mode":"two_fact_exact_values"}} +{"id":"negative-control-32k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: negative-control-32k\nApproximate target context: 32000 tokens; needle position: absent.\nThe source may or may not contain a Python benchmark needle for negative-control-32k. If the needle is absent, reply exactly: NOT_FOUND.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n```\n
","tags":["context-window","needle-retrieval","python","negative-control","32k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":32000,"needle_position":"absent","needle_count":0,"evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-needle-64k.jsonl b/backend/data/datasets/context-needle-64k.jsonl new file mode 100644 index 0000000..d106f83 --- /dev/null +++ b/backend/data/datasets/context-needle-64k.jsonl @@ -0,0 +1,5 @@ +{"id":"needle-front-64k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-front-64k\nApproximate target context: 64000 tokens; needle position: front.\nFind the Python benchmark needle for needle-front-64k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n# InferHarness context needle: needle-front-64k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_64K_FRONT\"\n# End InferHarness context needle\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n```\n
","tags":["context-window","needle-retrieval","python","front","64k"],"expected_answer":"IH_NEEDLE_64K_FRONT","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":64000,"needle_position":"front","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-middle-64k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-middle-64k\nApproximate target context: 64000 tokens; needle position: middle.\nFind the Python benchmark needle for needle-middle-64k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n# InferHarness context needle: needle-middle-64k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_64K_MIDDLE\"\n# End InferHarness context needle\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n```\n
","tags":["context-window","needle-retrieval","python","middle","64k"],"expected_answer":"IH_NEEDLE_64K_MIDDLE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":64000,"needle_position":"middle","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-late-64k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-late-64k\nApproximate target context: 64000 tokens; needle position: late_80_percent.\nFind the Python benchmark needle for needle-late-64k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n# InferHarness context needle: needle-late-64k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_64K_LATE\"\n# End InferHarness context needle\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n```\n
","tags":["context-window","needle-retrieval","python","late_80_percent","64k"],"expected_answer":"IH_NEEDLE_64K_LATE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":64000,"needle_position":"late_80_percent","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-two-facts-64k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-two-facts-64k\nApproximate target context: 64000 tokens; needle position: two_facts_20_and_80_percent.\nFind both Python benchmark needles for needle-two-facts-64k. Reply with the string assigned to BENCHMARK_NEEDLE_ALPHA, then a | separator, then the string assigned to BENCHMARK_NEEDLE_BETA, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n# InferHarness context needle: needle-two-facts-64k alpha\nBENCHMARK_NEEDLE_ALPHA = \"IH_NEEDLE_64K_ALPHA\"\n# End InferHarness context needle alpha\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n# InferHarness context needle: needle-two-facts-64k beta\nBENCHMARK_NEEDLE_BETA = \"IH_NEEDLE_64K_BETA\"\n# End InferHarness context needle beta\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n```\n
","tags":["context-window","needle-retrieval","python","two-facts","64k"],"expected_answer":"IH_NEEDLE_64K_ALPHA|IH_NEEDLE_64K_BETA","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":64000,"needle_position":"two_facts_20_and_80_percent","needle_count":2,"evaluation_mode":"two_fact_exact_values"}} +{"id":"negative-control-64k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: negative-control-64k\nApproximate target context: 64000 tokens; needle position: absent.\nThe source may or may not contain a Python benchmark needle for negative-control-64k. If the needle is absent, reply exactly: NOT_FOUND.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n```\n
","tags":["context-window","needle-retrieval","python","negative-control","64k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":64000,"needle_position":"absent","needle_count":0,"evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/context-needle-8k.jsonl b/backend/data/datasets/context-needle-8k.jsonl new file mode 100644 index 0000000..7ae9192 --- /dev/null +++ b/backend/data/datasets/context-needle-8k.jsonl @@ -0,0 +1,5 @@ +{"id":"needle-front-8k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-front-8k\nApproximate target context: 8000 tokens; needle position: front.\nFind the Python benchmark needle for needle-front-8k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n# InferHarness context needle: needle-front-8k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_8K_FRONT\"\n# End InferHarness context needle\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n```\n
","tags":["context-window","needle-retrieval","python","front","8k"],"expected_answer":"IH_NEEDLE_8K_FRONT","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":8000,"needle_position":"front","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-middle-8k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-middle-8k\nApproximate target context: 8000 tokens; needle position: middle.\nFind the Python benchmark needle for needle-middle-8k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n# InferHarness context needle: needle-middle-8k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_8K_MIDDLE\"\n# End InferHarness context needle\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n```\n
","tags":["context-window","needle-retrieval","python","middle","8k"],"expected_answer":"IH_NEEDLE_8K_MIDDLE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":8000,"needle_position":"middle","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-late-8k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-late-8k\nApproximate target context: 8000 tokens; needle position: late_80_percent.\nFind the Python benchmark needle for needle-late-8k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n# InferHarness context needle: needle-late-8k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_8K_LATE\"\n# End InferHarness context needle\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n```\n
","tags":["context-window","needle-retrieval","python","late_80_percent","8k"],"expected_answer":"IH_NEEDLE_8K_LATE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":8000,"needle_position":"late_80_percent","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-two-facts-8k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-two-facts-8k\nApproximate target context: 8000 tokens; needle position: two_facts_20_and_80_percent.\nFind both Python benchmark needles for needle-two-facts-8k. Reply with the string assigned to BENCHMARK_NEEDLE_ALPHA, then a | separator, then the string assigned to BENCHMARK_NEEDLE_BETA, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n# InferHarness context needle: needle-two-facts-8k alpha\nBENCHMARK_NEEDLE_ALPHA = \"IH_NEEDLE_8K_ALPHA\"\n# End InferHarness context needle alpha\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n# InferHarness context needle: needle-two-facts-8k beta\nBENCHMARK_NEEDLE_BETA = \"IH_NEEDLE_8K_BETA\"\n# End InferHarness context needle beta\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n```\n
","tags":["context-window","needle-retrieval","python","two-facts","8k"],"expected_answer":"IH_NEEDLE_8K_ALPHA|IH_NEEDLE_8K_BETA","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":8000,"needle_position":"two_facts_20_and_80_percent","needle_count":2,"evaluation_mode":"two_fact_exact_values"}} +{"id":"negative-control-8k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: negative-control-8k\nApproximate target context: 8000 tokens; needle position: absent.\nThe source may or may not contain a Python benchmark needle for negative-control-8k. If the needle is absent, reply exactly: NOT_FOUND.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n```\n
","tags":["context-window","needle-retrieval","python","negative-control","8k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":8000,"needle_position":"absent","needle_count":0,"evaluation_mode":"negative_control_not_found"}} diff --git a/backend/data/datasets/positional-recall-python.jsonl b/backend/data/datasets/positional-recall-python.jsonl index a7b51e6..8ec63dd 100644 --- a/backend/data/datasets/positional-recall-python.jsonl +++ b/backend/data/datasets/positional-recall-python.jsonl @@ -1,10 +1,5 @@ -{"id": "positional-recall-python-1", "system_prompt": "You are a concise coding assistant. Return only the requested code.", "prompt": "Here is the complete source of `frame.py`:\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n```\n\nReproduce the first 20 lines of the body of function `_is_homogeneous_type`.", "tags": ["codegen", "python", "positional-recall", "long-context"]} -{"id": "positional-recall-python-2", "system_prompt": "You are a concise coding assistant. Return only the requested code.", "prompt": "Here is the complete source of `frame.py`:\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n```\n\nReproduce the first 20 lines of the body of function `iterrows`.", "tags": ["codegen", "python", "positional-recall", "long-context"]} -{"id": "positional-recall-python-3", "system_prompt": "You are a concise coding assistant. Return only the requested code.", "prompt": "Here is the complete source of `frame.py`:\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n```\n\nReproduce the first 20 lines of the body of function `to_records`.", "tags": ["codegen", "python", "positional-recall", "long-context"]} -{"id": "positional-recall-python-4", "system_prompt": "You are a concise coding assistant. Return only the requested code.", "prompt": "Here is the complete source of `frame.py`:\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n```\n\nReproduce the first 20 lines of the body of function `to_stata`.", "tags": ["codegen", "python", "positional-recall", "long-context"]} -{"id": "positional-recall-python-5", "system_prompt": "You are a concise coding assistant. Return only the requested code.", "prompt": "Here is the complete source of `frame.py`:\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n```\n\nReproduce the first 20 lines of the body of function `query`.", "tags": ["codegen", "python", "positional-recall", "long-context"]} -{"id": "positional-recall-python-6", "system_prompt": "You are a concise coding assistant. Return only the requested code.", "prompt": "Here is the complete source of `frame.py`:\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n```\n\nReproduce the first 20 lines of the body of function `select_dtypes`.", "tags": ["codegen", "python", "positional-recall", "long-context"]} -{"id": "positional-recall-python-7", "system_prompt": "You are a concise coding assistant. Return only the requested code.", "prompt": "Here is the complete source of `frame.py`:\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n```\n\nReproduce the first 20 lines of the body of function `reindex`.", "tags": ["codegen", "python", "positional-recall", "long-context"]} -{"id": "positional-recall-python-8", "system_prompt": "You are a concise coding assistant. Return only the requested code.", "prompt": "Here is the complete source of `frame.py`:\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n```\n\nReproduce the first 20 lines of the body of function `_align_for_op`.", "tags": ["codegen", "python", "positional-recall", "long-context"]} -{"id": "positional-recall-python-9", "system_prompt": "You are a concise coding assistant. Return only the requested code.", "prompt": "Here is the complete source of `frame.py`:\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n```\n\nReproduce the first 20 lines of the body of function `pow`.", "tags": ["codegen", "python", "positional-recall", "long-context"]} -{"id": "positional-recall-python-10", "system_prompt": "You are a concise coding assistant. Return only the requested code.", "prompt": "Here is the complete source of `frame.py`:\n\n```python\n\"\"\"\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n # error: Item \"ndarray\" of \"Union[ndarray, Series, Index]\" has no\n # attribute \"name\"\n {data.name: data},\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n # For data is list-like, or Iterable (will consume into list)\n elif is_list_like(data):\n if not isinstance(data, abc.Sequence):\n if hasattr(data, \"__array__\"):\n # GH#44616 big perf improvement for e.g. pytorch tensor\n data = np.asarray(data)\n else:\n data = list(data)\n if len(data) > 0:\n if is_dataclass(data[0]):\n data = dataclasses_to_dicts(data)\n if not isinstance(data, np.ndarray) and treat_as_nested(data):\n # exclude ndarray as we may have cast it a few lines above\n if columns is not None:\n columns = ensure_index(columns)\n arrays, columns, index = nested_data_to_arrays(\n # error: Argument 3 to \"nested_data_to_arrays\" has incompatible\n # type \"Optional[Collection[Any]]\"; expected \"Optional[Index]\"\n data,\n columns,\n index, # type: ignore[arg-type]\n dtype,\n )\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n )\n else:\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n else:\n mgr = dict_to_mgr(\n {},\n index,\n columns if columns is not None else default_index(0),\n dtype=dtype,\n )\n # For data is scalar\n else:\n if index is None or columns is None:\n raise ValueError(\"DataFrame constructor not properly called!\")\n\n index = ensure_index(index)\n columns = ensure_index(columns)\n\n if not dtype:\n dtype, _ = infer_dtype_from_scalar(data)\n\n # For data is a scalar extension dtype\n if isinstance(dtype, ExtensionDtype):\n # TODO(EA2D): special case not needed with 2D EAs\n\n values = [\n construct_1d_arraylike_from_scalar(data, len(index), dtype)\n for _ in range(len(columns))\n ]\n mgr = arrays_to_mgr(values, columns, index, dtype=None)\n else:\n arr2d = construct_2d_arraylike_from_scalar(\n data,\n len(index),\n len(columns),\n dtype,\n copy,\n )\n\n mgr = ndarray_to_mgr(\n arr2d,\n index,\n columns,\n dtype=arr2d.dtype,\n copy=False,\n )\n\n NDFrame.__init__(self, mgr)\n\n # ----------------------------------------------------------------------\n\n def __dataframe__(\n self, nan_as_null: bool = False, allow_copy: bool = True\n ) -> DataFrameXchg:\n \"\"\"\n Return the dataframe interchange object implementing the interchange protocol.\n\n .. deprecated:: 3.0.0\n\n The Dataframe Interchange Protocol is deprecated.\n For dataframe-agnostic code, you may want to look into:\n\n - `Arrow PyCapsule Interface `_\n - `Narwhals `_\n\n .. note::\n\n For new development, we highly recommend using the Arrow C Data Interface\n alongside the Arrow PyCapsule Interface instead of the interchange protocol\n\n .. warning::\n\n Due to severe implementation issues, we recommend only considering using the\n interchange protocol in the following cases:\n\n - converting to pandas: for pandas >= 2.0.3\n - converting from pandas: for pandas >= 3.0.0\n\n Parameters\n ----------\n nan_as_null : bool, default False\n `nan_as_null` is DEPRECATED and has no effect. Please avoid using\n it; it will be removed in a future release.\n allow_copy : bool, default True\n Whether to allow memory copying when exporting. If set to False\n it would cause non-zero-copy exports to fail.\n\n Returns\n -------\n DataFrame interchange object\n The object which consuming library can use to ingress the dataframe.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n\n Notes\n -----\n Details on the interchange protocol:\n https://data-apis.org/dataframe-protocol/latest/index.html\n\n Examples\n --------\n >>> df_not_necessarily_pandas = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> interchange_object = df_not_necessarily_pandas.__dataframe__()\n >>> interchange_object.column_names()\n Index(['A', 'B'], dtype='str')\n >>> df_pandas = pd.api.interchange.from_dataframe(\n ... interchange_object.select_columns_by_name([\"A\"])\n ... )\n >>> df_pandas\n A\n 0 1\n 1 2\n\n These methods (``column_names``, ``select_columns_by_name``) should work\n for any dataframe library which implements the interchange protocol.\n \"\"\"\n warnings.warn(\n \"The Dataframe Interchange Protocol is deprecated.\\n\"\n \"For dataframe-agnostic code, you may want to look into:\\n\"\n \"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\\n\"\n \"- Narwhals: https://github.com/narwhals-dev/narwhals\\n\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n from pandas.core.interchange.dataframe import PandasDataFrameXchg\n\n return PandasDataFrameXchg(self, allow_copy=allow_copy)\n\n def __arrow_c_stream__(self, requested_schema=None):\n \"\"\"\n Export the pandas DataFrame as an Arrow C stream PyCapsule.\n\n This relies on pyarrow to convert the pandas DataFrame to the Arrow\n format (and follows the default behaviour of ``pyarrow.Table.from_pandas``\n in its handling of the index, i.e. store the index as a column except\n for RangeIndex).\n This conversion is not necessarily zero-copy.\n\n Parameters\n ----------\n requested_schema : PyCapsule, default None\n The schema to which the dataframe should be casted, passed as a\n PyCapsule containing a C ArrowSchema representation of the\n requested schema.\n\n Returns\n -------\n PyCapsule\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if requested_schema is not None:\n requested_schema = pa.Schema._import_from_c_capsule(requested_schema)\n table = pa.Table.from_pandas(self, schema=requested_schema)\n return table.__arrow_c_stream__()\n\n # ----------------------------------------------------------------------\n\n @property\n def axes(self) -> list[Index]:\n \"\"\"\n Return a list representing the axes of the DataFrame.\n\n It has the row axis labels and column axis labels as the only members.\n They are returned in that order.\n\n See Also\n --------\n DataFrame.index: The index (row labels) of the DataFrame.\n DataFrame.columns: The column labels of the DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.axes\n [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]\n \"\"\"\n return [self.index, self.columns]\n\n @property\n def shape(self) -> tuple[int, int]:\n \"\"\"\n Return a tuple representing the dimensionality of the DataFrame.\n\n Unlike the `len()` method, which only returns the number of rows, `shape`\n provides both row and column counts, making it a more informative method for\n understanding dataset size.\n\n See Also\n --------\n numpy.ndarray.shape : Tuple of array dimensions.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.shape\n (2, 2)\n\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4], \"col3\": [5, 6]})\n >>> df.shape\n (2, 3)\n \"\"\"\n return len(self.index), len(self.columns)\n\n @property\n def _is_homogeneous_type(self) -> bool:\n \"\"\"\n Whether all the columns in a DataFrame have the same type.\n\n Returns\n -------\n bool\n\n Examples\n --------\n >>> DataFrame({\"A\": [1, 2], \"B\": [3, 4]})._is_homogeneous_type\n True\n >>> DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.0]})._is_homogeneous_type\n False\n\n Items with the same type but different sizes are considered\n different types.\n\n >>> DataFrame(\n ... {\n ... \"A\": np.array([1, 2], dtype=np.int32),\n ... \"B\": np.array([1, 2], dtype=np.int64),\n ... }\n ... )._is_homogeneous_type\n False\n \"\"\"\n # The \"<\" part of \"<=\" here is for empty DataFrame cases\n return len({block.values.dtype for block in self._mgr.blocks}) <= 1\n\n @property\n def _can_fast_transpose(self) -> bool:\n \"\"\"\n Can we transpose this DataFrame without creating any new array objects.\n \"\"\"\n blocks = self._mgr.blocks\n if len(blocks) != 1:\n return False\n\n dtype = blocks[0].dtype\n # TODO(EA2D) special case would be unnecessary with 2D EAs\n return not is_1d_only_ea_dtype(dtype)\n\n @property\n def _values(self) -> np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray:\n \"\"\"\n Analogue to ._values that may return a 2D ExtensionArray.\n \"\"\"\n mgr = self._mgr\n\n blocks = mgr.blocks\n if len(blocks) != 1:\n return ensure_wrapped_if_datetimelike(self.values)\n\n arr = blocks[0].values\n if arr.ndim == 1:\n # non-2D ExtensionArray\n return self.values\n\n # more generally, whatever we allow in NDArrayBackedExtensionBlock\n arr = cast(\"np.ndarray | DatetimeArray | TimedeltaArray | PeriodArray\", arr)\n return arr.T\n\n # ----------------------------------------------------------------------\n # Rendering Methods\n\n def _repr_fits_vertical_(self) -> bool:\n \"\"\"\n Check length against max_rows.\n \"\"\"\n max_rows = config[\"display\"][\"max_rows\"]\n return len(self) <= max_rows\n\n def _repr_fits_horizontal_(self) -> bool:\n \"\"\"\n Check if full repr fits in horizontal boundaries imposed by the display\n options width and max_columns.\n \"\"\"\n width, height = console.get_console_size()\n max_columns = config[\"display\"][\"max_columns\"]\n nb_columns = len(self.columns)\n\n # exceed max columns\n if (max_columns and nb_columns > max_columns) or (\n width and nb_columns > (width // 2)\n ):\n return False\n\n # used by repr_html under IPython notebook or scripts ignore terminal\n # dims\n if width is None or not console.in_interactive_session():\n return True\n\n if config[\"display\"][\"width\"] is not None or console.in_ipython_frontend():\n # check at least the column row for excessive width\n max_rows = 1\n else:\n max_rows = config[\"display\"][\"max_rows\"]\n\n # when auto-detecting, so width=None and not in ipython front end\n # check whether repr fits horizontal by actually checking\n # the width of the rendered repr\n buf = StringIO()\n\n # only care about the stuff we'll actually print out\n # and to_string on entire frame may be expensive\n d = self\n\n if max_rows is not None: # unlimited rows\n # min of two, where one may be None\n d = d.iloc[: min(max_rows, len(d))]\n else:\n return True\n\n d.to_string(buf=buf)\n value = buf.getvalue()\n repr_width = max(len(line) for line in value.split(\"\\n\"))\n\n return repr_width < width\n\n def _info_repr(self) -> bool:\n \"\"\"\n True if the repr should show the info view.\n \"\"\"\n info_repr_option = config[\"display\"][\"large_repr\"] == \"info\"\n return info_repr_option and not (\n self._repr_fits_horizontal_() and self._repr_fits_vertical_()\n )\n\n def __repr__(self) -> str:\n \"\"\"\n Return a string representation for a particular DataFrame.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n return buf.getvalue()\n\n repr_params = fmt.get_dataframe_repr_params()\n return self.to_string(**repr_params)\n\n def _repr_html_(self) -> str | None:\n \"\"\"\n Return a html representation for a particular DataFrame.\n\n Mainly for IPython notebook.\n \"\"\"\n if self._info_repr():\n buf = StringIO()\n self.info(buf=buf)\n # need to escape the , should be the first line.\n val = buf.getvalue().replace(\"<\", r\"<\", 1)\n val = val.replace(\">\", r\">\", 1)\n return f\"
{val}
\"\n\n if config[\"display\"][\"notebook_repr_html\"]:\n max_rows = config[\"display\"][\"max_rows\"]\n min_rows = config[\"display\"][\"min_rows\"]\n max_cols = config[\"display\"][\"max_columns\"]\n show_dimensions = config[\"display\"][\"show_dimensions\"]\n show_floats = config[\"display\"][\"float_format\"]\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=None,\n col_space=None,\n na_rep=\"NaN\",\n formatters=None,\n float_format=show_floats,\n sparsify=None,\n justify=None,\n index_names=True,\n header=True,\n index=True,\n bold_rows=True,\n escape=True,\n max_rows=max_rows,\n min_rows=min_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=\".\",\n )\n return fmt.DataFrameRenderer(formatter).to_html(notebook=True)\n else:\n return None\n\n @overload\n def to_string(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n @overload\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: int | list[int] | dict[Hashable, int] | None = ...,\n header: bool | SequenceNotStr[str] = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: fmt.FormattersType | None = ...,\n float_format: fmt.FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool = ...,\n decimal: str = ...,\n line_width: int | None = ...,\n min_rows: int | None = ...,\n max_colwidth: int | None = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n def to_string(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: int | list[int] | dict[Hashable, int] | None = None,\n header: bool | SequenceNotStr[str] = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: fmt.FormattersType | None = None,\n float_format: fmt.FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool = False,\n decimal: str = \".\",\n line_width: int | None = None,\n min_rows: int | None = None,\n max_colwidth: int | None = None,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to a console-friendly tabular output.\n\n This method converts the DataFrame to a string representation suitable\n for printing or writing to a file.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : int, list or dict of int, optional\n The minimum width of each column. If a list of ints is given every\n integers corresponds with one column. If a dict is given, the key\n references the column, while the value defines the space to use.\n header : bool or list of str, optional\n Write out the column names. If a list of columns is given, it is\n assumed to be aliases for the column names.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n line_width : int, optional\n Width to wrap a line in characters.\n min_rows : int, optional\n The number of rows to display in the console in a truncated repr\n (when number of rows is above `max_rows`).\n max_colwidth : int, optional\n Max width to truncate each column in characters. By default, no limit.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_html : Convert DataFrame to HTML.\n\n Examples\n --------\n >>> d = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\n >>> df = pd.DataFrame(d)\n >>> print(df.to_string())\n col1 col2\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n from pandas import option_context\n\n with option_context(\"display.max_colwidth\", max_colwidth):\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n formatters=formatters,\n float_format=float_format,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n header=header,\n index=index,\n min_rows=min_rows,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n decimal=decimal,\n )\n return fmt.DataFrameRenderer(formatter).to_string(\n buf=buf,\n encoding=encoding,\n line_width=line_width,\n )\n\n def _get_values_for_csv(\n self,\n *,\n float_format: FloatFormatType | None,\n date_format: str | None,\n decimal: str,\n na_rep: str,\n quoting, # int csv.QUOTE_FOO from stdlib\n ) -> DataFrame:\n # helper used by to_csv\n mgr = self._mgr.get_values_for_csv(\n float_format=float_format,\n date_format=date_format,\n decimal=decimal,\n na_rep=na_rep,\n quoting=quoting,\n )\n return self._constructor_from_mgr(mgr, axes=mgr.axes)\n\n # ----------------------------------------------------------------------\n\n @property\n def style(self) -> Styler:\n \"\"\"\n Returns a Styler object.\n\n Contains methods for building a styled HTML representation of the DataFrame.\n\n See Also\n --------\n io.formats.style.Styler : Helps style a DataFrame or Series according to the\n data with HTML and CSS.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df.style # doctest: +SKIP\n\n Please see\n `Table Visualization <../../user_guide/style.ipynb>`_ for more examples.\n \"\"\"\n # Raise AttributeError so that inspect works even if jinja2 is not installed.\n has_jinja2 = import_optional_dependency(\"jinja2\", errors=\"ignore\")\n if not has_jinja2:\n raise AttributeError(\"The '.style' accessor requires jinja2\")\n\n from pandas.io.formats.style import Styler\n\n return Styler(self)\n\n def items(self) -> Iterable[tuple[Hashable, Series]]:\n r\"\"\"\n Iterate over (column name, Series) pairs.\n\n Iterates over the DataFrame columns, returning a tuple with\n the column name and the content as a Series.\n\n Yields\n ------\n label : object\n The column names for the DataFrame being iterated over.\n content : Series\n The column entries belonging to each label, as a Series.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as\n (index, Series) pairs.\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples\n of the values.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"species\": [\"bear\", \"bear\", \"marsupial\"],\n ... \"population\": [1864, 22000, 80000],\n ... },\n ... index=[\"panda\", \"polar\", \"koala\"],\n ... )\n >>> df\n species population\n panda bear 1864\n polar bear 22000\n koala marsupial 80000\n >>> for label, content in df.items():\n ... print(f\"label: {label}\")\n ... print(f\"content: {content}\", sep=\"\\n\")\n label: species\n content:\n panda bear\n polar bear\n koala marsupial\n Name: species, dtype: str\n label: population\n content:\n panda 1864\n polar 22000\n koala 80000\n Name: population, dtype: int64\n \"\"\"\n for i, k in enumerate(self.columns):\n yield k, self._ixs(i, axis=1)\n\n def iterrows(self) -> Iterable[tuple[Hashable, Series]]:\n \"\"\"\n Iterate over DataFrame rows as (index, Series) pairs.\n\n Each row is yielded as a (index, Series) tuple; the Series has\n the same index as the DataFrame columns. Note that dtypes may\n not be preserved across rows. Prefer :meth:`itertuples` for\n speed and type consistency.\n\n Yields\n ------\n index : label or tuple of label\n The index of the row. A tuple for a `MultiIndex`.\n data : Series\n The data of the row as a Series.\n\n See Also\n --------\n DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n 1. Because ``iterrows`` returns a Series for each row,\n it does **not** preserve dtypes across the rows (dtypes are\n preserved across columns for DataFrames).\n\n To preserve dtypes while iterating over the rows, it is better\n to use :meth:`itertuples` which returns namedtuples of the values\n and which is generally faster than ``iterrows``.\n\n 2. You should **never modify** something you are iterating over.\n This is not guaranteed to work in all cases. Depending on the\n data types, the iterator returns a copy and not a view, and writing\n to it will have no effect.\n\n Examples\n --------\n\n >>> df = pd.DataFrame([[1, 1.5]], columns=[\"int\", \"float\"])\n >>> row = next(df.iterrows())[1]\n >>> row\n int 1.0\n float 1.5\n Name: 0, dtype: float64\n >>> print(row[\"int\"].dtype)\n float64\n >>> print(df[\"int\"].dtype)\n int64\n \"\"\"\n columns = self.columns\n klass = self._constructor_sliced\n for k, v in zip(self.index, self.values, strict=True):\n s = klass(v, index=columns, name=k).__finalize__(self)\n if self._mgr.is_single_block:\n s._mgr.add_references(self._mgr)\n yield k, s\n\n def itertuples(\n self, index: bool = True, name: str | None = \"Pandas\"\n ) -> Iterable[tuple[Any, ...]]:\n \"\"\"\n Iterate over DataFrame rows as namedtuples.\n\n Each row becomes a namedtuple (or plain tuple if ``name`` is\n None) with field names taken from the column names or\n positional names. Generally faster and more type-stable than\n :meth:`iterrows`.\n\n Parameters\n ----------\n index : bool, default True\n If True, return the index as the first element of the tuple.\n name : str or None, default \"Pandas\"\n The name of the returned namedtuples or None to return regular\n tuples.\n\n Returns\n -------\n iterator\n An object to iterate over namedtuples for each row in the\n DataFrame with the first field possibly being the index and\n following fields being the column values.\n\n See Also\n --------\n DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)\n pairs.\n DataFrame.items : Iterate over (column name, Series) pairs.\n\n Notes\n -----\n The column names will be renamed to positional names if they are\n invalid Python identifiers, repeated, or start with an underscore.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [4, 2], \"num_wings\": [0, 2]}, index=[\"dog\", \"hawk\"]\n ... )\n >>> df\n num_legs num_wings\n dog 4 0\n hawk 2 2\n >>> for row in df.itertuples():\n ... print(row)\n Pandas(Index='dog', num_legs=4, num_wings=0)\n Pandas(Index='hawk', num_legs=2, num_wings=2)\n\n By setting the `index` parameter to False we can remove the index\n as the first element of the tuple:\n\n >>> for row in df.itertuples(index=False):\n ... print(row)\n Pandas(num_legs=4, num_wings=0)\n Pandas(num_legs=2, num_wings=2)\n\n With the `name` parameter set we set a custom name for the yielded\n namedtuples:\n\n >>> for row in df.itertuples(name=\"Animal\"):\n ... print(row)\n Animal(Index='dog', num_legs=4, num_wings=0)\n Animal(Index='hawk', num_legs=2, num_wings=2)\n \"\"\"\n arrays = []\n fields = list(self.columns)\n if index:\n arrays.append(self.index)\n fields.insert(0, \"Index\")\n\n # use integer indexing because of possible duplicate column names\n arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))\n\n if name is not None:\n # https://github.com/python/mypy/issues/9046\n # error: namedtuple() expects a string literal as the first argument\n itertuple = collections.namedtuple( # type: ignore[misc]\n name, fields, rename=True\n )\n return map(itertuple._make, zip(*arrays, strict=True))\n\n # fallback to regular tuples\n return zip(*arrays, strict=True)\n\n def __len__(self) -> int:\n \"\"\"\n Returns length of info axis, but here we use the index.\n \"\"\"\n return len(self.index)\n\n @overload\n def dot(self, other: Series) -> Series: ...\n\n @overload\n def dot(self, other: DataFrame | Index | ArrayLike) -> DataFrame: ...\n\n def dot(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Compute the matrix multiplication between the DataFrame and other.\n\n This method computes the matrix product between the DataFrame and the\n values of an other Series, DataFrame or a numpy array.\n\n It can also be called using ``self @ other``.\n\n Parameters\n ----------\n other : Series, DataFrame or array-like\n The other object to compute the matrix product with.\n\n Returns\n -------\n Series or DataFrame\n If other is a Series, return the matrix product between self and\n other as a Series. If other is a DataFrame or a numpy.array, return\n the matrix product of self and other in a DataFrame of a np.array.\n\n See Also\n --------\n Series.dot: Similar method for Series.\n\n Notes\n -----\n The dimensions of DataFrame and other must be compatible in order to\n compute the matrix multiplication. In addition, the column names of\n DataFrame and the index of other must contain the same values, as they\n will be aligned prior to the multiplication.\n\n The dot method for Series computes the inner product, instead of the\n matrix product here.\n\n Examples\n --------\n Here we multiply a DataFrame with a Series.\n\n >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])\n >>> s = pd.Series([1, 1, 2, 1])\n >>> df.dot(s)\n 0 -4\n 1 5\n dtype: int64\n\n Here we multiply a DataFrame with another DataFrame.\n\n >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(other)\n 0 1\n 0 1 4\n 1 2 2\n\n Note that the dot method give the same result as @\n\n >>> df @ other\n 0 1\n 0 1 4\n 1 2 2\n\n The dot method works also if other is an np.array.\n\n >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])\n >>> df.dot(arr)\n 0 1\n 0 1 4\n 1 2 2\n\n Note how shuffling of the objects does not change the result.\n\n >>> s2 = s.reindex([1, 0, 2, 3])\n >>> df.dot(s2)\n 0 -4\n 1 5\n dtype: int64\n \"\"\"\n if isinstance(other, (Series, DataFrame)):\n common = self.columns.union(other.index)\n if len(common) > len(self.columns) or len(common) > len(other.index):\n raise ValueError(\"matrices are not aligned\")\n\n left = self.reindex(columns=common)\n right = other.reindex(index=common)\n lvals = left.values\n rvals = right._values\n else:\n left = self\n lvals = self.values\n rvals = np.asarray(other)\n if lvals.shape[1] != rvals.shape[0]:\n raise ValueError(\n f\"Dot product shape mismatch, {lvals.shape} vs {rvals.shape}\"\n )\n\n if isinstance(other, DataFrame):\n common_type = find_common_type(list(self.dtypes) + list(other.dtypes))\n return self._constructor(\n np.dot(lvals, rvals),\n index=left.index,\n columns=other.columns,\n copy=False,\n dtype=common_type,\n )\n elif isinstance(other, Series):\n common_type = find_common_type([*list(self.dtypes), other.dtypes])\n return self._constructor_sliced(\n np.dot(lvals, rvals), index=left.index, copy=False, dtype=common_type\n )\n elif isinstance(rvals, (np.ndarray, Index)):\n result = np.dot(lvals, rvals)\n if result.ndim == 2:\n return self._constructor(result, index=left.index, copy=False)\n else:\n return self._constructor_sliced(result, index=left.index, copy=False)\n else: # pragma: no cover\n raise TypeError(f\"unsupported type: {type(other)}\")\n\n @overload\n def __matmul__(self, other: Series) -> Series: ...\n\n @overload\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series: ...\n\n def __matmul__(self, other: AnyArrayLike | DataFrame) -> DataFrame | Series:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n return self.dot(other)\n\n def __rmatmul__(self, other) -> DataFrame:\n \"\"\"\n Matrix multiplication using binary `@` operator.\n \"\"\"\n try:\n return self.T.dot(np.transpose(other)).T\n except ValueError as err:\n if \"shape mismatch\" not in str(err):\n raise\n # GH#21581 give exception message for original shapes\n msg = f\"shapes {np.shape(other)} and {self.shape} not aligned\"\n raise ValueError(msg) from err\n\n # ----------------------------------------------------------------------\n # IO methods (to / from other formats)\n\n @classmethod\n def from_arrow(\n cls, data: ArrowArrayExportable | ArrowStreamExportable\n ) -> DataFrame:\n \"\"\"\n Construct a DataFrame from a tabular Arrow object.\n\n This function accepts any Arrow-compatible tabular object implementing\n the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__``\n or ``__arrow_c_stream__`` method).\n\n This function currently relies on ``pyarrow`` to convert the tabular\n object in Arrow format to pandas.\n\n .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n .. versionadded:: 3.0\n\n Parameters\n ----------\n data : pyarrow.Table or Arrow-compatible table\n Any tabular object implementing the Arrow PyCapsule Protocol\n (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__``\n method).\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n Series.from_arrow : Construct a Series from an Arrow object.\n\n Examples\n --------\n >>> import pyarrow as pa\n >>> table = pa.table({\"a\": [1, 2, 3], \"b\": [\"x\", \"y\", \"z\"]})\n >>> pd.DataFrame.from_arrow(table)\n a b\n 0 1 x\n 1 2 y\n 2 3 z\n \"\"\"\n pa = import_optional_dependency(\"pyarrow\", min_version=\"14.0.0\")\n if not isinstance(data, pa.Table):\n if not (\n hasattr(data, \"__arrow_c_array__\")\n or hasattr(data, \"__arrow_c_stream__\")\n ):\n # explicitly test this, because otherwise we would accept variour other\n # input types through the pa.table(..) call\n raise TypeError(\n \"Expected an Arrow-compatible tabular object (i.e. having an \"\n \"'_arrow_c_array__' or '__arrow_c_stream__' method), got \"\n f\"'{type(data).__name__}' instead.\"\n )\n pa_table = pa.table(data)\n else:\n pa_table = data\n\n df = pa_table.to_pandas()\n return df\n\n @classmethod\n def from_dict(\n cls,\n data: dict,\n orient: FromDictOrient = \"columns\",\n dtype: Dtype | None = None,\n columns: Axes | None = None,\n ) -> DataFrame:\n \"\"\"\n Construct DataFrame from dict of array-like or dicts.\n\n Creates DataFrame object from dictionary by columns or by index\n allowing dtype specification.\n\n Parameters\n ----------\n data : dict\n Of the form {field : array-like} or {field : dict}.\n\n .. deprecated:: 3.1.0\n Passing a non-dict to ``from_dict`` is deprecated.\n Use the :class:`DataFrame` constructor instead.\n orient : {'columns', 'index', 'tight'}, default 'columns'\n The \"orientation\" of the data. If the keys of the passed dict\n should be the columns of the resulting DataFrame, pass 'columns'\n (default). Otherwise if the keys should be rows, pass 'index'.\n If 'tight', assume a dict with keys ['index', 'columns', 'data',\n 'index_names', 'column_names'].\n\n dtype : dtype, default None\n Data type to force after DataFrame construction, otherwise infer.\n columns : list, default None\n Column labels to use when ``orient='index'``. Raises a ValueError\n if used with ``orient='columns'`` or ``orient='tight'``.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_records : DataFrame from structured ndarray, sequence\n of tuples or dicts, or DataFrame.\n DataFrame : DataFrame object creation using constructor.\n DataFrame.to_dict : Convert the DataFrame to a dictionary.\n\n Examples\n --------\n By default the keys of the dict become the DataFrame columns:\n\n >>> data = {\"col_1\": [3, 2, 1, 0], \"col_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Specify ``orient='index'`` to create the DataFrame using dictionary\n keys as rows:\n\n >>> data = {\"row_1\": [3, 2, 1, 0], \"row_2\": [\"a\", \"b\", \"c\", \"d\"]}\n >>> pd.DataFrame.from_dict(data, orient=\"index\")\n 0 1 2 3\n row_1 3 2 1 0\n row_2 a b c d\n\n When using the 'index' orientation, the column names can be\n specified manually:\n\n >>> pd.DataFrame.from_dict(data, orient=\"index\", columns=[\"A\", \"B\", \"C\", \"D\"])\n A B C D\n row_1 3 2 1 0\n row_2 a b c d\n\n Specify ``orient='tight'`` to create the DataFrame using a 'tight'\n format:\n\n >>> data = {\n ... \"index\": [(\"a\", \"b\"), (\"a\", \"c\")],\n ... \"columns\": [(\"x\", 1), (\"y\", 2)],\n ... \"data\": [[1, 3], [2, 4]],\n ... \"index_names\": [\"n1\", \"n2\"],\n ... \"column_names\": [\"z1\", \"z2\"],\n ... }\n >>> pd.DataFrame.from_dict(data, orient=\"tight\")\n z1 x y\n z2 1 2\n n1 n2\n a b 1 3\n c 2 4\n \"\"\"\n index: list | Index | None = None\n if not isinstance(data, dict):\n warnings.warn(\n f\"Passing a {type(data).__name__} to DataFrame.from_dict is \"\n \"deprecated. Use the DataFrame constructor instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n orient = orient.lower() # type: ignore[assignment]\n if orient == \"index\":\n if len(data) > 0:\n # TODO speed up Series case\n if isinstance(next(iter(data.values())), (Series, dict)):\n data = _from_nested_dict(data)\n else:\n index = list(data.keys())\n # error: Incompatible types in assignment (expression has type\n # \"List[Any]\", variable has type \"Dict[Any, Any]\")\n data = list(data.values()) # type: ignore[assignment]\n elif orient in (\"columns\", \"tight\"):\n if columns is not None:\n raise ValueError(f\"cannot use columns parameter with orient='{orient}'\")\n else: # pragma: no cover\n raise ValueError(\n f\"Expected 'index', 'columns' or 'tight' for orient parameter. \"\n f\"Got '{orient}' instead\"\n )\n\n if orient != \"tight\":\n return cls(data, index=index, columns=columns, dtype=dtype)\n else:\n realdata = data[\"data\"]\n\n def create_index(indexlist, namelist) -> Index:\n index: Index\n if len(namelist) > 1:\n index = MultiIndex.from_tuples(indexlist, names=namelist)\n else:\n index = Index(indexlist, name=namelist[0])\n return index\n\n index = create_index(data[\"index\"], data[\"index_names\"])\n columns = create_index(data[\"columns\"], data[\"column_names\"])\n return cls(realdata, index=index, columns=columns, dtype=dtype)\n\n def to_numpy(\n self,\n dtype: npt.DTypeLike | None = None,\n copy: bool = False,\n na_value: object = lib.no_default,\n ) -> np.ndarray:\n \"\"\"\n Convert the DataFrame to a NumPy array.\n\n By default, the dtype of the returned array will be the common NumPy\n dtype of all types in the DataFrame. For example, if the dtypes are\n ``float16`` and ``float32``, the results dtype will be ``float32``.\n This may require copying data and coercing values, which may be\n expensive.\n\n Parameters\n ----------\n dtype : str or numpy.dtype, optional\n The dtype to pass to :meth:`numpy.asarray`.\n copy : bool, default False\n Whether to ensure that the returned value is not a view on\n another array. Note that ``copy=False`` does not *ensure* that\n ``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that\n a copy is made, even if not strictly necessary.\n na_value : Any, optional\n The value to use for missing values. The default value depends\n on `dtype` and the dtypes of the DataFrame columns.\n\n Returns\n -------\n numpy.ndarray\n The NumPy array representing the values in the DataFrame.\n\n See Also\n --------\n Series.to_numpy : Similar method for Series.\n\n Examples\n --------\n >>> pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]}).to_numpy()\n array([[1, 3],\n [2, 4]])\n\n With heterogeneous data, the lowest common type will have to\n be used.\n\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3.0, 4.5]})\n >>> df.to_numpy()\n array([[1. , 3. ],\n [2. , 4.5]])\n\n For a mix of numeric and non-numeric types, the output array will\n have object dtype.\n\n >>> df[\"C\"] = pd.date_range(\"2000\", periods=2)\n >>> df.to_numpy()\n array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],\n [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)\n \"\"\"\n if dtype is not None:\n dtype = np.dtype(dtype)\n result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)\n if result.dtype is not dtype:\n result = np.asarray(result, dtype=dtype)\n\n return result\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> MutableMappingT: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[MutableMappingT] | MutableMappingT,\n index: bool = ...,\n ) -> list[MutableMappingT]: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"dict\", \"list\", \"series\", \"split\", \"tight\", \"index\"] = ...,\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> dict: ...\n\n @overload\n def to_dict(\n self,\n orient: Literal[\"records\"],\n *,\n into: type[dict] = ...,\n index: bool = ...,\n ) -> list[dict]: ...\n\n # error: Incompatible default for argument \"into\" (default has type \"type\n # [dict[Any, Any]]\", argument has type \"type[MutableMappingT] | MutableMappingT\")\n def to_dict(\n self,\n orient: Literal[\n \"dict\", \"list\", \"series\", \"split\", \"tight\", \"records\", \"index\"\n ] = \"dict\",\n *,\n into: type[MutableMappingT] | MutableMappingT = dict, # type: ignore[assignment]\n index: bool = True,\n ) -> MutableMappingT | list[MutableMappingT]:\n \"\"\"\n Convert the DataFrame to a dictionary.\n\n The type of the key-value pairs can be customized with the parameters\n (see below).\n\n Parameters\n ----------\n orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}\n Determines the type of the values of the dictionary.\n\n - 'dict' (default) : dict like {column -> {index -> value}}\n - 'list' : dict like {column -> [values]}\n - 'series' : dict like {column -> Series(values)}\n - 'split' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}\n - 'tight' : dict like\n {'index' -> [index], 'columns' -> [columns], 'data' -> [values],\n 'index_names' -> [index.names], 'column_names' -> [column.names]}\n - 'records' : list like\n [{column -> value}, ... , {column -> value}]\n - 'index' : dict like {index -> {column -> value}}\n\n into : class, default dict\n The collections.abc.MutableMapping subclass used for all Mappings\n in the return value. Can be the actual class or an empty\n instance of the mapping type you want. If you want a\n collections.defaultdict, you must pass it initialized.\n\n index : bool, default True\n Whether to include the index item (and index_names item if `orient`\n is 'tight') in the returned dictionary. Can only be ``False``\n when `orient` is 'split' or 'tight'. Note that when `orient` is\n 'records', this parameter does not take effect (index item always\n not included).\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n dict, list or collections.abc.MutableMapping\n Return a collections.abc.MutableMapping object representing the\n DataFrame. The resulting transformation depends on the `orient`\n parameter.\n\n See Also\n --------\n DataFrame.from_dict: Create a DataFrame from a dictionary.\n DataFrame.to_json: Convert a DataFrame to JSON format.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"col1\": [1, 2], \"col2\": [0.5, 0.75]}, index=[\"row1\", \"row2\"]\n ... )\n >>> df\n col1 col2\n row1 1 0.50\n row2 2 0.75\n >>> df.to_dict()\n {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}\n\n You can specify the return orientation.\n\n >>> df.to_dict(\"series\")\n {'col1': row1 1\n row2 2\n Name: col1, dtype: int64,\n 'col2': row1 0.50\n row2 0.75\n Name: col2, dtype: float64}\n\n >>> df.to_dict(\"split\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]]}\n\n >>> df.to_dict(\"records\")\n [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]\n\n >>> df.to_dict(\"index\")\n {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}\n\n >>> df.to_dict(\"tight\")\n {'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],\n 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}\n\n You can also specify the mapping type.\n\n >>> from collections import OrderedDict, defaultdict\n >>> df.to_dict(into=OrderedDict)\n OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}),\n 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})\n\n If you want a `defaultdict`, you need to initialize it:\n\n >>> dd = defaultdict(list)\n >>> df.to_dict(\"records\", into=dd)\n [defaultdict(, {'col1': 1, 'col2': 0.5}),\n defaultdict(, {'col1': 2, 'col2': 0.75})]\n \"\"\"\n from pandas.core.methods.to_dict import to_dict\n\n return to_dict(self, orient, into=into, index=index)\n\n @classmethod\n def from_records(\n cls,\n data,\n index=None,\n exclude=None,\n columns=None,\n coerce_float: bool = False,\n nrows: int | None = None,\n ) -> DataFrame:\n \"\"\"\n Convert structured or record ndarray to DataFrame.\n\n Creates a DataFrame object from a structured ndarray, or iterable of\n tuples or dicts.\n\n Parameters\n ----------\n data : structured ndarray, iterable of tuples or dicts, or dict\n Structured input data.\n\n .. deprecated:: 3.1.0\n Passing a dict is deprecated. Use the DataFrame constructor\n or :meth:`DataFrame.from_dict` instead.\n\n index : str, list of fields, array-like\n Field of array to use as the index, alternately a specific set of\n input labels to use.\n exclude : sequence, default None\n Columns or fields to exclude.\n columns : sequence, default None\n Column names to use. If the passed data do not have names\n associated with them, this argument provides names for the\n columns. Otherwise, this argument indicates the order of the columns\n in the result (any names not found in the data will become all-NA\n columns) and limits the data to these columns if not all column names\n are provided.\n coerce_float : bool, default False\n Attempt to convert values of non-string, non-numeric objects (like\n decimal.Decimal) to floating point, useful for SQL result sets.\n nrows : int, default None\n Number of rows to read if data is an iterator.\n\n Returns\n -------\n DataFrame\n\n See Also\n --------\n DataFrame.from_dict : DataFrame from dict of array-like or dicts.\n DataFrame : DataFrame object creation using constructor.\n\n Examples\n --------\n Data can be provided as a structured ndarray:\n\n >>> data = np.array(\n ... [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")],\n ... dtype=[(\"col_1\", \"i4\"), (\"col_2\", \"U1\")],\n ... )\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of dicts:\n\n >>> data = [\n ... {\"col_1\": 3, \"col_2\": \"a\"},\n ... {\"col_1\": 2, \"col_2\": \"b\"},\n ... {\"col_1\": 1, \"col_2\": \"c\"},\n ... {\"col_1\": 0, \"col_2\": \"d\"},\n ... ]\n >>> pd.DataFrame.from_records(data)\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n\n Data can be provided as a list of tuples with corresponding columns:\n\n >>> data = [(3, \"a\"), (2, \"b\"), (1, \"c\"), (0, \"d\")]\n >>> pd.DataFrame.from_records(data, columns=[\"col_1\", \"col_2\"])\n col_1 col_2\n 0 3 a\n 1 2 b\n 2 1 c\n 3 0 d\n \"\"\"\n if isinstance(data, DataFrame):\n raise TypeError(\n \"Passing a DataFrame to DataFrame.from_records is not supported. Use \"\n \"set_index and/or drop to modify the DataFrame instead.\",\n )\n\n if isinstance(data, dict):\n warnings.warn(\n \"Passing a dict to DataFrame.from_records is deprecated. \"\n \"Use the DataFrame constructor or DataFrame.from_dict instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n result_index = None\n\n # Make a copy of the input columns so we can modify it\n if columns is not None:\n columns = ensure_index(columns)\n\n def maybe_reorder(\n arrays: list[ArrayLike], arr_columns: Index, columns: Index, index\n ) -> tuple[list[ArrayLike], Index, Index | None]:\n \"\"\"\n If our desired 'columns' do not match the data's pre-existing 'arr_columns',\n we re-order our arrays. This is like a preemptive (cheap) reindex.\n \"\"\"\n if len(arrays):\n length = len(arrays[0])\n else:\n length = 0\n\n result_index = None\n if len(arrays) == 0 and index is None and length == 0:\n result_index = default_index(0)\n\n arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns, length)\n return arrays, arr_columns, result_index\n\n if is_iterator(data):\n if nrows == 0:\n if columns is not None and exclude is not None:\n columns = columns.drop(exclude)\n return cls(index=index, columns=columns)\n\n try:\n first_row = next(data)\n except StopIteration:\n return cls(index=index, columns=columns)\n\n dtype = None\n if hasattr(first_row, \"dtype\") and first_row.dtype.names:\n dtype = first_row.dtype\n\n values = [first_row]\n\n if nrows is None:\n values += data\n else:\n values.extend(itertools.islice(data, nrows - 1))\n\n if dtype is not None:\n data = np.array(values, dtype=dtype)\n else:\n data = values\n\n if isinstance(data, dict):\n if columns is None:\n columns = arr_columns = ensure_index(sorted(data))\n arrays = [data[k] for k in columns]\n else:\n arrays = []\n arr_columns_list = []\n for k, v in data.items():\n if k in columns:\n arr_columns_list.append(k)\n arrays.append(v)\n\n arr_columns = Index(arr_columns_list)\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n elif isinstance(data, np.ndarray):\n arrays, columns = to_arrays(data, columns)\n arr_columns = columns\n else:\n arrays, arr_columns = to_arrays(data, columns)\n if coerce_float:\n for i, arr in enumerate(arrays):\n if arr.dtype == object:\n # error: Argument 1 to \"maybe_convert_objects\" has\n # incompatible type \"Union[ExtensionArray, ndarray]\";\n # expected \"ndarray\"\n arrays[i] = lib.maybe_convert_objects(\n arr, # type: ignore[arg-type]\n try_float=True,\n )\n\n arr_columns = ensure_index(arr_columns)\n if columns is None:\n columns = arr_columns\n else:\n arrays, arr_columns, result_index = maybe_reorder(\n arrays, arr_columns, columns, index\n )\n\n if exclude is None:\n exclude = set()\n else:\n exclude = set(exclude)\n\n if index is not None:\n if isinstance(index, str) or not hasattr(index, \"__iter__\"):\n i = columns.get_loc(index)\n exclude.add(index)\n if len(arrays) > 0:\n result_index = Index(arrays[i], name=index)\n else:\n result_index = Index([], name=index)\n else:\n try:\n index_data = [arrays[arr_columns.get_loc(field)] for field in index]\n except (KeyError, TypeError):\n # raised by get_loc, see GH#29258\n result_index = index\n else:\n result_index = ensure_index_from_sequences(index_data, names=index)\n exclude.update(index)\n\n if any(exclude):\n arr_exclude = (x for x in exclude if x in arr_columns)\n to_remove = {\n arr_columns.get_loc(col)\n for col in arr_exclude # pyright: ignore[reportUnhashable]\n }\n arrays = [v for i, v in enumerate(arrays) if i not in to_remove]\n\n columns = columns.drop(exclude)\n\n mgr = arrays_to_mgr(arrays, columns, result_index)\n df = DataFrame._from_mgr(mgr, axes=mgr.axes)\n if cls is not DataFrame:\n return cls(df, copy=False)\n return df\n\n def to_records(\n self, index: bool = True, column_dtypes=None, index_dtypes=None\n ) -> np.rec.recarray:\n \"\"\"\n Convert DataFrame to a NumPy record array.\n\n Index will be included as the first field of the record array if\n requested.\n\n Parameters\n ----------\n index : bool, default True\n Include index in resulting record array, stored in 'index'\n field or using the index label, if set.\n column_dtypes : str, type, dict, default None\n If a string or type, the data type to store all columns. If\n a dictionary, a mapping of column names and indices (zero-indexed)\n to specific data types.\n index_dtypes : str, type, dict, default None\n If a string or type, the data type to store all index levels. If\n a dictionary, a mapping of index level names and indices\n (zero-indexed) to specific data types.\n\n This mapping is applied only if `index=True`.\n\n Returns\n -------\n numpy.rec.recarray\n NumPy ndarray with the DataFrame labels as fields and each row\n of the DataFrame as entries.\n\n See Also\n --------\n DataFrame.from_records: Convert structured or record ndarray\n to DataFrame.\n numpy.rec.recarray: An ndarray that allows field access using\n attributes, analogous to typed columns in a\n spreadsheet.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [0.5, 0.75]}, index=[\"a\", \"b\"])\n >>> df\n A B\n a 1 0.50\n b 2 0.75\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('index', 'O'), ('A', '>> df.index = df.index.rename(\"I\")\n >>> df.to_records()\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index=False)\n rec.array([(1, 0.5 ), (2, 0.75)],\n dtype=[('A', '>> df.to_records(column_dtypes={\"A\": \"int32\"})\n rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],\n dtype=[('I', 'O'), ('A', '>> df.to_records(index_dtypes=\">> index_dtypes = f\">> df.to_records(index_dtypes=index_dtypes)\n rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],\n dtype=[('I', 'S1'), ('A', ' Self:\n \"\"\"\n Create DataFrame from a list of arrays corresponding to the columns.\n\n Parameters\n ----------\n arrays : list-like of arrays\n Each array in the list corresponds to one column, in order.\n columns : list-like, Index\n The column names for the resulting DataFrame.\n index : list-like, Index\n The rows labels for the resulting DataFrame.\n dtype : dtype, optional\n Optional dtype to enforce for all arrays.\n verify_integrity : bool, default True\n Validate and homogenize all input. If set to False, it is assumed\n that all elements of `arrays` are actual arrays how they will be\n stored in a block (numpy ndarray or ExtensionArray), have the same\n length as and are aligned with the index, and that `columns` and\n `index` are ensured to be an Index object.\n\n Returns\n -------\n DataFrame\n \"\"\"\n if dtype is not None:\n dtype = pandas_dtype(dtype)\n\n columns = ensure_index(columns)\n if len(columns) != len(arrays):\n raise ValueError(\"len(columns) must match len(arrays)\")\n mgr = arrays_to_mgr(\n arrays,\n columns,\n index,\n dtype=dtype,\n verify_integrity=verify_integrity,\n )\n return cls._from_mgr(mgr, axes=mgr.axes)\n\n def to_stata(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n convert_dates: dict[Hashable, str] | None = None,\n write_index: bool = True,\n byteorder: ToStataByteorder | None = None,\n time_stamp: datetime.datetime | None = None,\n data_label: str | None = None,\n variable_labels: dict[Hashable, str] | None = None,\n version: int | None = 114,\n convert_strl: Sequence[Hashable] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n value_labels: dict[Hashable, dict[float, str]] | None = None,\n ) -> None:\n \"\"\"\n Export DataFrame object to Stata dta format.\n\n Writes the DataFrame to a Stata dataset file.\n \"dta\" files contain a Stata dataset.\n\n Parameters\n ----------\n path : str, path object, or buffer\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function.\n\n convert_dates : dict\n Dictionary mapping columns containing datetime types to stata\n internal format to use when writing the dates. Options are 'tc',\n 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer\n or a name. Datetime columns that do not have a conversion type\n specified will be converted to 'tc'. Raises NotImplementedError if\n a datetime column has timezone information.\n write_index : bool\n Write the index to Stata dataset.\n byteorder : str\n Can be \">\", \"<\", \"little\", or \"big\". default is `sys.byteorder`.\n time_stamp : datetime\n A datetime to use as file creation date. Default is the current\n time.\n data_label : str, optional\n A label for the data set. Must be 80 characters or smaller.\n variable_labels : dict\n Dictionary containing columns as keys and variable labels as\n values. Each label must be 80 characters or smaller.\n version : {114, 117, 118, 119, None}, default 114\n Version to use in the output dta file. Set to None to let pandas\n decide between 118 or 119 formats depending on the number of\n columns in the frame. Version 114 can be read by Stata 10 and\n later. Version 117 can be read by Stata 13 or later. Version 118\n is supported in Stata 14 and later. Version 119 is supported in\n Stata 15 and later. Version 114 limits string variables to 244\n characters or fewer while versions 117 and later allow strings\n with lengths up to 2,000,000 characters. Versions 118 and 119\n support Unicode characters, and version 119 supports more than\n 32,767 variables.\n\n Version 119 should usually only be used when the number of\n variables exceeds the capacity of dta format 118. Exporting\n smaller datasets in format 119 may have unintended consequences,\n and, as of November 2020, Stata SE cannot read version 119 files.\n\n convert_strl : list, optional\n List of column names to convert to string columns to Stata StrL\n format. Only available if version is 117. Storing strings in the\n StrL format can produce smaller dta files if strings have more than\n 8 characters and values are repeated.\n\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and 'path' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n value_labels : dict of dicts\n Dictionary containing columns as keys and dictionaries of column value\n to labels as values. Labels for a single variable must be 32,000\n characters or smaller.\n\n Raises\n ------\n NotImplementedError\n * If datetimes contain timezone information\n * Column dtype is not representable in Stata\n ValueError\n * Columns listed in convert_dates are neither datetime64[ns]\n or datetime.datetime\n * Column listed in convert_dates is not in DataFrame\n * Categorical label contains more than 32,000 characters\n\n See Also\n --------\n read_stata : Import Stata data files.\n io.stata.StataWriter : Low-level writer for Stata data files.\n io.stata.StataWriter117 : Low-level writer for version 117 files.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"falcon\", 350], [\"parrot\", 18]], columns=[\"animal\", \"parrot\"]\n ... )\n >>> df.to_stata(\"animals.dta\") # doctest: +SKIP\n \"\"\"\n if version not in (114, 117, 118, 119, None):\n raise ValueError(\"Only formats 114, 117, 118 and 119 are supported.\")\n if version == 114:\n if convert_strl is not None:\n raise ValueError(\"strl is not supported in format 114\")\n from pandas.io.stata import StataWriter as statawriter\n elif version == 117:\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriter117 as statawriter,\n )\n else: # versions 118 and 119\n # Incompatible import of \"statawriter\" (imported name has type\n # \"Type[StataWriter117]\", local name has type \"Type[StataWriter]\")\n from pandas.io.stata import ( # type: ignore[assignment]\n StataWriterUTF8 as statawriter,\n )\n\n kwargs: dict[str, Any] = {}\n if version is None or version >= 117:\n # strl conversion is only supported >= 117\n kwargs[\"convert_strl\"] = convert_strl\n if version is None or version >= 118:\n # Specifying the version is only supported for UTF8 (118 or 119)\n kwargs[\"version\"] = version\n\n writer = statawriter(\n path,\n self,\n convert_dates=convert_dates,\n byteorder=byteorder,\n time_stamp=time_stamp,\n data_label=data_label,\n write_index=write_index,\n variable_labels=variable_labels,\n compression=compression,\n storage_options=storage_options,\n value_labels=value_labels,\n **kwargs,\n )\n writer.write_file()\n\n def to_feather(self, path: FilePath | WriteBuffer[bytes], **kwargs) -> None:\n \"\"\"\n Write a DataFrame to the binary Feather format.\n\n The Feather format is a lightweight, language-agnostic columnar file\n format based on Apache Arrow, designed for efficient read and write\n performance. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, path object, file-like object\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If a string or a path,\n it will be used as Root Directory path when writing a partitioned dataset.\n **kwargs :\n Additional keywords passed to :func:`pyarrow.feather.write_feather`.\n This includes the `compression`, `compression_level`, `chunksize`\n and `version` keywords.\n\n See Also\n --------\n DataFrame.to_parquet : Write a DataFrame to the binary parquet format.\n DataFrame.to_excel : Write object to an Excel sheet.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_json : Convert the object to a JSON string.\n DataFrame.to_html : Render a DataFrame as an HTML table.\n DataFrame.to_string : Convert DataFrame to a string.\n\n Notes\n -----\n This function writes the dataframe as a `feather file\n `_. Requires a default\n index. For saving the DataFrame with your custom index use a method that\n supports custom indices e.g. `to_parquet`.\n\n Examples\n --------\n >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])\n >>> df.to_feather(\"file.feather\") # doctest: +SKIP\n \"\"\"\n from pandas.io.feather_format import to_feather\n\n to_feather(self, path, **kwargs)\n\n @overload\n def to_markdown(\n self,\n buf: None = ...,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> None: ...\n\n @overload\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None,\n *,\n mode: str = ...,\n index: bool = ...,\n storage_options: StorageOptions | None = ...,\n **kwargs,\n ) -> str | None: ...\n\n def to_markdown(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n mode: str = \"wt\",\n index: bool = True,\n storage_options: StorageOptions | None = None,\n **kwargs,\n ) -> str | None:\n \"\"\"\n Print DataFrame in Markdown-friendly format.\n\n Generates a Markdown table representation of the\n DataFrame using the ``tabulate`` library. The result can be written\n to a file or returned as a string for embedding in Markdown documents.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n mode : str, optional\n Mode in which file is opened, \"wt\" by default.\n index : bool, optional, default True\n Add index (row) labels.\n\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n **kwargs\n These parameters will be passed to `tabulate `_.\n\n Returns\n -------\n str\n DataFrame in Markdown-friendly format.\n\n See Also\n --------\n DataFrame.to_html : Render DataFrame to HTML-formatted table.\n DataFrame.to_latex : Render DataFrame to LaTeX-formatted table.\n\n Notes\n -----\n Requires the `tabulate `_ package.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... data={\"animal_1\": [\"elk\", \"pig\"], \"animal_2\": [\"dog\", \"quetzal\"]}\n ... )\n >>> print(df.to_markdown())\n | | animal_1 | animal_2 |\n |---:|:-----------|:-----------|\n | 0 | elk | dog |\n | 1 | pig | quetzal |\n\n Output markdown with a tabulate option.\n\n >>> print(df.to_markdown(tablefmt=\"grid\"))\n +----+------------+------------+\n | | animal_1 | animal_2 |\n +====+============+============+\n | 0 | elk | dog |\n +----+------------+------------+\n | 1 | pig | quetzal |\n +----+------------+------------+\n \"\"\"\n if \"showindex\" in kwargs:\n raise ValueError(\"Pass 'index' instead of 'showindex\")\n\n kwargs.setdefault(\"headers\", \"keys\")\n kwargs.setdefault(\"tablefmt\", \"pipe\")\n kwargs.setdefault(\"showindex\", index)\n tabulate = import_optional_dependency(\"tabulate\")\n result = tabulate.tabulate(self, **kwargs)\n if buf is None:\n return result\n\n with get_handle(buf, mode, storage_options=storage_options) as handles:\n handles.handle.write(result)\n return None\n\n @overload\n def to_parquet(\n self,\n path: None = ...,\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> bytes: ...\n\n @overload\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"auto\", \"pyarrow\", \"fastparquet\"] = ...,\n compression: ParquetCompressionOptions = ...,\n index: bool | None = ...,\n partition_cols: list[str] | None = ...,\n storage_options: StorageOptions = ...,\n filesystem: Any = ...,\n **kwargs,\n ) -> None: ...\n\n def to_parquet(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: (\n Literal[\"auto\", \"pyarrow\", \"fastparquet\"] | lib.NoDefault\n ) = lib.no_default,\n compression: ParquetCompressionOptions = \"snappy\",\n index: bool | None = None,\n partition_cols: list[str] | None = None,\n storage_options: StorageOptions | None = None,\n filesystem: Any = None,\n **kwargs,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the binary parquet format.\n\n This function writes the dataframe as a `parquet file\n `_. You can choose different parquet\n backends, and have the option of compression. See\n :ref:`the user guide ` for more details.\n\n Parameters\n ----------\n path : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a binary ``write()`` function. If None, the result is\n returned as bytes. If a string or path, it will be used as the root\n directory path when writing a partitioned dataset.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.parquet``. A remote example could be:\n ``s3://bucket/path/to/table.parquet``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'\n Parquet library to use. If 'auto', then the option\n ``io.parquet.engine`` is used. The default ``io.parquet.engine``\n behavior is to try 'pyarrow', falling back to 'fastparquet' if\n 'pyarrow' is unavailable.\n\n .. deprecated:: 3.1.0\n The ``'fastparquet'`` and ``'auto'`` engine options are\n deprecated. Use ``'pyarrow'`` or do not pass ``engine``\n to use the default.\n\n compression : str or None, default 'snappy'\n Name of the compression to use. Use ``None`` for no compression.\n Supported options: 'snappy', 'gzip', 'brotli', 'lz4', 'zstd'.\n index : bool, default None\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``True`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n partition_cols : list, optional, default None\n Column names by which to partition the dataset.\n Columns are partitioned in the order they are given.\n Must be None if path is not a string.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n filesystem : fsspec or pyarrow filesystem, default None\n Filesystem object to use when reading the parquet file. Only implemented\n for ``engine=\"pyarrow\"``.\n\n .. versionadded:: 2.1.0\n\n **kwargs\n Additional arguments passed to the parquet library. See\n :ref:`pandas io ` for more details.\n\n Returns\n -------\n bytes if no path argument is provided else None\n Returns the DataFrame converted to the binary parquet format as bytes if no\n path argument. Returns None and writes the DataFrame to the specified\n location in the Parquet format if the path argument is provided.\n\n See Also\n --------\n read_parquet : Read a parquet file.\n DataFrame.to_orc : Write an orc file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * This function requires either the `fastparquet\n `_ or `pyarrow\n `_ library.\n * When saving a DataFrame with categorical columns to parquet,\n the file size may increase due to the inclusion of all possible\n categories, not just those present in the data. This behavior\n is expected and consistent with pandas' handling of categorical data.\n To manage file size and ensure a more predictable roundtrip process,\n consider using :meth:`Categorical.remove_unused_categories` on the\n DataFrame before saving.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df.to_parquet(\"df.parquet.gzip\", compression=\"gzip\") # doctest: +SKIP\n >>> pd.read_parquet(\"df.parquet.gzip\") # doctest: +SKIP\n col1 col2\n 0 1 3\n 1 2 4\n\n If you want to get a buffer to the parquet content you can use a io.BytesIO\n object, as long as you don't use partition_cols, which creates multiple files.\n\n >>> import io\n >>> f = io.BytesIO()\n >>> df.to_parquet(f)\n >>> f.seek(0)\n 0\n >>> content = f.read()\n \"\"\"\n from pandas.io.parquet import to_parquet\n\n return to_parquet(\n self,\n path,\n engine,\n compression=compression,\n index=index,\n partition_cols=partition_cols,\n storage_options=storage_options,\n filesystem=filesystem,\n **kwargs,\n )\n\n @overload\n def to_orc(\n self,\n path: None = ...,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes],\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> None: ...\n\n @overload\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None,\n *,\n engine: Literal[\"pyarrow\"] = ...,\n index: bool | None = ...,\n engine_kwargs: dict[str, Any] | None = ...,\n ) -> bytes | None: ...\n\n def to_orc(\n self,\n path: FilePath | WriteBuffer[bytes] | None = None,\n *,\n engine: Literal[\"pyarrow\"] = \"pyarrow\",\n index: bool | None = None,\n engine_kwargs: dict[str, Any] | None = None,\n ) -> bytes | None:\n \"\"\"\n Write a DataFrame to the Optimized Row Columnar (ORC) format.\n\n ORC is a self-describing, type-aware columnar file format designed\n for Hadoop workloads. It provides efficient compression and encoding\n schemes, making it well-suited for large-scale data storage and\n analytics. This method requires the ``pyarrow`` library.\n\n Parameters\n ----------\n path : str, file-like object or None, default None\n If a string, it will be used as the root directory path\n when writing a partitioned dataset. By file-like object,\n we refer to objects with a write() method, such as a file handle\n (e.g. via builtin open function). If path is None,\n a bytes object is returned.\n\n The string could be a URL. Valid URL schemes include http, ftp, s3,\n gs, and file. For file URLs, a host is expected. A local file could be:\n ``file://localhost/path/to/table.orc``. A remote example could be:\n ``s3://bucket/path/to/table.orc``.\n\n Certain URL schemes may require additional packages. For example, S3\n URLs require the ``s3fs`` library. See\n :ref:`install.optional_dependencies` for a full list.\n engine : {'pyarrow'}, default 'pyarrow'\n ORC library to use.\n index : bool, optional\n If ``True``, include the dataframe's index(es) in the file output.\n If ``False``, they will not be written to the file.\n If ``None``, similar to ``infer`` the dataframe's index(es)\n will be saved. However, instead of being saved as values,\n the RangeIndex will be stored as a range in the metadata so it\n doesn't require much space and is faster. Other indexes will\n be included as columns in the file output.\n engine_kwargs : dict[str, Any] or None, default None\n Additional keyword arguments passed to :func:`pyarrow.orc.write_table`.\n\n Returns\n -------\n bytes if no ``path`` argument is provided else None\n Bytes object with DataFrame data if ``path`` is not specified else None.\n\n Raises\n ------\n NotImplementedError\n Dtype of one or more columns is category, unsigned integers, interval,\n period or sparse.\n ValueError\n engine is not pyarrow.\n\n See Also\n --------\n read_orc : Read a ORC file.\n DataFrame.to_parquet : Write a parquet file.\n DataFrame.to_csv : Write a csv file.\n DataFrame.to_sql : Write to a sql table.\n DataFrame.to_hdf : Write to hdf.\n\n Notes\n -----\n * Find more information on ORC\n `here `__.\n * Before using this function you should read the :ref:`user guide about\n ORC ` and :ref:`install optional dependencies `.\n * This function requires `pyarrow `_\n library.\n * For supported dtypes please refer to `supported ORC features in Arrow\n `__.\n * Currently timezones in datetime columns are not preserved when a\n dataframe is converted into ORC files.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_orc(\"df.orc\") # doctest: +SKIP\n >>> pd.read_orc(\"df.orc\") # doctest: +SKIP\n col1 col2\n 0 1 4\n 1 2 3\n\n If you want to get a buffer to the orc content you can write it to io.BytesIO\n\n >>> import io\n >>> b = io.BytesIO(df.to_orc()) # doctest: +SKIP\n >>> b.seek(0) # doctest: +SKIP\n 0\n >>> content = b.read() # doctest: +SKIP\n \"\"\"\n from pandas.io.orc import to_orc\n\n return to_orc(\n self, path, engine=engine, index=index, engine_kwargs=engine_kwargs\n )\n\n @overload\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str],\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> None: ...\n\n @overload\n def to_html(\n self,\n buf: None = ...,\n *,\n columns: Axes | None = ...,\n col_space: ColspaceArgType | None = ...,\n header: bool = ...,\n index: bool = ...,\n na_rep: str = ...,\n formatters: FormattersType | None = ...,\n float_format: FloatFormatType | None = ...,\n sparsify: bool | None = ...,\n index_names: bool = ...,\n justify: str | None = ...,\n max_rows: int | None = ...,\n max_cols: int | None = ...,\n show_dimensions: bool | str = ...,\n decimal: str = ...,\n bold_rows: bool = ...,\n classes: str | list | tuple | None = ...,\n escape: bool = ...,\n notebook: bool = ...,\n border: int | bool | None = ...,\n table_id: str | None = ...,\n render_links: bool = ...,\n encoding: str | None = ...,\n ) -> str: ...\n\n def to_html(\n self,\n buf: FilePath | WriteBuffer[str] | None = None,\n *,\n columns: Axes | None = None,\n col_space: ColspaceArgType | None = None,\n header: bool = True,\n index: bool = True,\n na_rep: str = \"NaN\",\n formatters: FormattersType | None = None,\n float_format: FloatFormatType | None = None,\n sparsify: bool | None = None,\n index_names: bool = True,\n justify: str | None = None,\n max_rows: int | None = None,\n max_cols: int | None = None,\n show_dimensions: bool | str = False,\n decimal: str = \".\",\n bold_rows: bool = True,\n classes: str | list | tuple | None = None,\n escape: bool = True,\n notebook: bool = False,\n border: int | bool | None = None,\n table_id: str | None = None,\n render_links: bool = False,\n encoding: str | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame as an HTML table.\n\n Converts the DataFrame into an HTML ```` element. The resulting\n HTML can be written to a file or returned as a string. This is useful\n for embedding tabular data in web pages or HTML-based reports.\n\n Parameters\n ----------\n buf : str, Path or StringIO-like, optional, default None\n Buffer to write to. If None, the output is returned as a string.\n columns : array-like, optional, default None\n The subset of columns to write. Writes all columns by default.\n col_space : str or int, list or dict of int or str, optional\n The minimum width of each column in CSS length units. An int is\n assumed to be px units.\n header : bool, optional\n Whether to print column labels, default True.\n index : bool, optional, default True\n Whether to print index (row) labels.\n na_rep : str, optional, default 'NaN'\n String representation of ``NaN`` to use.\n formatters : list, tuple or dict of one-param. functions, optional\n Formatter functions to apply to columns' elements by position or\n name.\n The result of each function must be a unicode string.\n List/tuple must be of length equal to the number of columns.\n float_format : one-parameter function, optional, default None\n Formatter function to apply to columns' elements if they are\n floats. This function must return a unicode string and will be\n applied only to the non-``NaN`` elements, with ``NaN`` being\n handled by ``na_rep``.\n sparsify : bool, optional, default True\n Set to False for a DataFrame with a hierarchical index to print\n every multiindex key at each row.\n index_names : bool, optional, default True\n Prints the names of the indexes.\n justify : str, default None\n How to justify the column labels. If None uses the option from\n the print configuration (controlled by set_option), 'right' out\n of the box. Valid values are\n\n * left\n * right\n * center\n * justify\n * justify-all\n * start\n * end\n * inherit\n * match-parent\n * initial\n * unset.\n max_rows : int, optional\n Maximum number of rows to display in the console.\n max_cols : int, optional\n Maximum number of columns to display in the console.\n show_dimensions : bool, default False\n Display DataFrame dimensions (number of rows by number of columns).\n decimal : str, default '.'\n Character recognized as decimal separator, e.g. ',' in Europe.\n\n bold_rows : bool, default True\n Make the row labels bold in the output.\n classes : str or list or tuple, default None\n CSS class(es) to apply to the resulting html table.\n escape : bool, default True\n Convert the characters <, >, and & to HTML-safe sequences.\n notebook : {True, False}, default False\n Whether the generated HTML is for IPython Notebook.\n border : int or bool\n When an integer value is provided, it sets the border attribute in\n the opening tag, specifying the thickness of the border.\n If ``False`` or ``0`` is passed, the border attribute will not\n be present in the ``
`` tag.\n The default value for this parameter is governed by\n ``pd.options.display.html.border``.\n table_id : str, optional\n A css id is included in the opening `
` tag if specified.\n render_links : bool, default False\n Convert URLs to HTML links.\n encoding : str, default \"utf-8\"\n Set character encoding.\n\n Returns\n -------\n str or None\n If buf is None, returns the result as a string. Otherwise returns\n None.\n\n See Also\n --------\n to_string : Convert DataFrame to a string.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html()\n >>> print(html_string)\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
014
123
\n\n HTML output\n\n +----+-----+-----+\n | |col1 |col2 |\n +====+=====+=====+\n |0 |1 |4 |\n +----+-----+-----+\n |1 |2 |3 |\n +----+-----+-----+\n\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> html_string = df.to_html(index=False)\n >>> print(html_string)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
col1col2
14
23
\n\n HTML output\n\n +-----+-----+\n |col1 |col2 |\n +=====+=====+\n |1 |4 |\n +-----+-----+\n |2 |3 |\n +-----+-----+\n \"\"\"\n if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:\n raise ValueError(\"Invalid value for justify parameter\")\n\n formatter = fmt.DataFrameFormatter(\n self,\n columns=columns,\n col_space=col_space,\n na_rep=na_rep,\n header=header,\n index=index,\n formatters=formatters,\n float_format=float_format,\n bold_rows=bold_rows,\n sparsify=sparsify,\n justify=justify,\n index_names=index_names,\n escape=escape,\n decimal=decimal,\n max_rows=max_rows,\n max_cols=max_cols,\n show_dimensions=show_dimensions,\n )\n # TODO: a generic formatter wld b in DataFrameFormatter\n return fmt.DataFrameRenderer(formatter).to_html(\n buf=buf,\n classes=classes,\n notebook=notebook,\n border=border,\n encoding=encoding,\n table_id=table_id,\n render_links=render_links,\n )\n\n @overload\n def to_xml(\n self,\n path_or_buffer: None = ...,\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> str: ...\n\n @overload\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str],\n *,\n index: bool = ...,\n root_name: str | None = ...,\n row_name: str | None = ...,\n na_rep: str | None = ...,\n attr_cols: list[str] | None = ...,\n elem_cols: list[str] | None = ...,\n namespaces: dict[str | None, str] | None = ...,\n prefix: str | None = ...,\n encoding: str = ...,\n xml_declaration: bool | None = ...,\n pretty_print: bool | None = ...,\n parser: XMLParsers | None = ...,\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = ...,\n compression: CompressionOptions = ...,\n storage_options: StorageOptions | None = ...,\n ) -> None: ...\n\n def to_xml(\n self,\n path_or_buffer: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None,\n *,\n index: bool = True,\n root_name: str | None = \"data\",\n row_name: str | None = \"row\",\n na_rep: str | None = None,\n attr_cols: list[str] | None = None,\n elem_cols: list[str] | None = None,\n namespaces: dict[str | None, str] | None = None,\n prefix: str | None = None,\n encoding: str = \"utf-8\",\n xml_declaration: bool | None = True,\n pretty_print: bool | None = True,\n parser: XMLParsers | None = \"lxml\",\n stylesheet: FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None = None,\n compression: CompressionOptions = \"infer\",\n storage_options: StorageOptions | None = None,\n ) -> str | None:\n \"\"\"\n Render a DataFrame to an XML document.\n\n Produces an XML representation of the DataFrame where each row becomes\n an XML element. Column values can be mapped to either XML element text\n or attributes, and the output supports namespaces, XSLT stylesheets,\n and custom root/row element names.\n\n Parameters\n ----------\n path_or_buffer : str, path object, file-like object, or None, default None\n String, path object (implementing ``os.PathLike[str]``), or file-like\n object implementing a ``write()`` function. If None, the result is returned\n as a string.\n index : bool, default True\n Whether to include index in XML document.\n root_name : str, default 'data'\n The name of root element in XML document.\n row_name : str, default 'row'\n The name of row element in XML document.\n na_rep : str, optional\n Missing data representation.\n attr_cols : list-like, optional\n List of columns to write as attributes in row element.\n Hierarchical columns will be flattened with underscore\n delimiting the different levels.\n elem_cols : list-like, optional\n List of columns to write as children in row element. By default,\n all columns output as children of row element. Hierarchical\n columns will be flattened with underscore delimiting the\n different levels.\n namespaces : dict, optional\n All namespaces to be defined in root element. Keys of dict\n should be prefix names and values of dict corresponding URIs.\n Default namespaces should be given empty string key. For\n example, ::\n\n namespaces = {\"\": \"https://example.com\"}\n\n prefix : str, optional\n Namespace prefix to be used for every element and/or attribute\n in document. This should be one of the keys in ``namespaces``\n dict.\n encoding : str, default 'utf-8'\n Encoding of the resulting document.\n xml_declaration : bool, default True\n Whether to include the XML declaration at start of document.\n pretty_print : bool, default True\n Whether output should be pretty printed with indentation and\n line breaks.\n parser : {'lxml','etree'}, default 'lxml'\n Parser module to use for building of tree. Only 'lxml' and\n 'etree' are supported. With 'lxml', the ability to use XSLT\n stylesheet is supported.\n stylesheet : str, path object or file-like object, optional\n A URL, file-like object, or a raw string containing an XSLT\n script used to transform the raw XML output. Script should use\n layout of elements and attributes from original output. This\n argument requires ``lxml`` to be installed. Only XSLT 1.0\n scripts and not later versions is currently supported.\n compression : str or dict, default 'infer'\n For on-the-fly compression of the output data. If 'infer' and\n 'path_or_buffer' is\n path-like, then detect compression from the following extensions: '.gz',\n '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n (otherwise no compression).\n Set to ``None`` for no compression.\n Can also be a dict with key ``'method'`` set to one of\n {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'xz'``, ``'tar'``} and\n other key-value pairs are forwarded to\n ``zipfile.ZipFile``, ``gzip.GzipFile``,\n ``bz2.BZ2File``, ``zstandard.ZstdCompressor``, ``lzma.LZMAFile`` or\n ``tarfile.TarFile``, respectively.\n As an example, the following could be passed for faster compression and\n to create a reproducible gzip archive:\n ``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.\n storage_options : dict, optional\n Extra options that make sense for a particular storage connection, e.g.\n host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n are forwarded to ``urllib.request.Request`` as header options. For other\n URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n details, and for more examples on storage options refer `here\n `_.\n\n Returns\n -------\n None or str\n If ``io`` is None, returns the resulting XML format as a\n string. Otherwise returns None.\n\n See Also\n --------\n to_json : Convert the pandas object to a JSON string.\n to_html : Convert DataFrame to a html.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[\"square\", 360, 4], [\"circle\", 360, np.nan], [\"triangle\", 180, 3]],\n ... columns=[\"shape\", \"degrees\", \"sides\"],\n ... )\n\n >>> df.to_xml() # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n\n >>> df.to_xml(\n ... attr_cols=[\"index\", \"shape\", \"degrees\", \"sides\"]\n ... ) # doctest: +SKIP\n \n \n \n \n \n \n\n >>> df.to_xml(\n ... namespaces={\"doc\": \"https://example.com\"}, prefix=\"doc\"\n ... ) # doctest: +SKIP\n \n \n \n 0\n square\n 360\n 4.0\n \n \n 1\n circle\n 360\n \n \n \n 2\n triangle\n 180\n 3.0\n \n \n \"\"\"\n\n from pandas.io.formats.xml import (\n EtreeXMLFormatter,\n LxmlXMLFormatter,\n )\n\n lxml = import_optional_dependency(\"lxml.etree\", errors=\"ignore\")\n\n TreeBuilder: type[EtreeXMLFormatter | LxmlXMLFormatter]\n\n if parser == \"lxml\":\n if lxml is not None:\n TreeBuilder = LxmlXMLFormatter\n else:\n raise ImportError(\n \"lxml not found, please install or use the etree parser.\"\n )\n\n elif parser == \"etree\":\n TreeBuilder = EtreeXMLFormatter\n\n else:\n raise ValueError(\"Values for parser can only be lxml or etree.\")\n\n xml_formatter = TreeBuilder(\n self,\n path_or_buffer=path_or_buffer,\n index=index,\n root_name=root_name,\n row_name=row_name,\n na_rep=na_rep,\n attr_cols=attr_cols,\n elem_cols=elem_cols,\n namespaces=namespaces,\n prefix=prefix,\n encoding=encoding,\n xml_declaration=xml_declaration,\n pretty_print=pretty_print,\n stylesheet=stylesheet,\n compression=compression,\n storage_options=storage_options,\n )\n\n return xml_formatter.write_output()\n\n def to_iceberg(\n self,\n table_identifier: str,\n catalog_name: str | None = None,\n *,\n catalog_properties: dict[str, Any] | None = None,\n location: str | None = None,\n append: bool = False,\n snapshot_properties: dict[str, str] | None = None,\n ) -> None:\n \"\"\"\n Write a DataFrame to an Apache Iceberg table.\n\n .. versionadded:: 3.0.0\n\n .. warning::\n\n to_iceberg is experimental and may change without warning.\n\n Parameters\n ----------\n table_identifier : str\n Table identifier.\n catalog_name : str, optional\n The name of the catalog.\n catalog_properties : dict of {str: str}, optional\n The properties that are used next to the catalog configuration.\n location : str, optional\n Location for the table.\n append : bool, default False\n If ``True``, append data to the table, instead of replacing the content.\n snapshot_properties : dict of {str: str}, optional\n Custom properties to be added to the snapshot summary\n\n See Also\n --------\n read_iceberg : Read an Apache Iceberg table.\n DataFrame.to_parquet : Write a DataFrame in Parquet format.\n\n Examples\n --------\n >>> df = pd.DataFrame(data={\"col1\": [1, 2], \"col2\": [4, 3]})\n >>> df.to_iceberg(\"my_table\", catalog_name=\"my_catalog\") # doctest: +SKIP\n \"\"\"\n from pandas.io.iceberg import to_iceberg\n\n to_iceberg(\n self,\n table_identifier,\n catalog_name,\n catalog_properties=catalog_properties,\n location=location,\n append=append,\n snapshot_properties=snapshot_properties,\n )\n\n # ----------------------------------------------------------------------\n def info(\n self,\n verbose: bool | None = None,\n buf: WriteBuffer[str] | None = None,\n max_cols: int | None = None,\n memory_usage: bool | str | None = None,\n show_counts: bool | None = None,\n ) -> None:\n \"\"\"\n Print a concise summary of a DataFrame.\n\n This method prints information about a DataFrame including\n the index dtype and columns, non-NA values and memory usage.\n\n Parameters\n ----------\n verbose : bool, optional\n Whether to print the full summary. By default, the setting in\n ``pandas.options.display.max_info_columns`` is followed.\n buf : writable buffer, defaults to sys.stdout\n Where to send the output. By default, the output is printed to\n sys.stdout. Pass a writable buffer if you need to further process\n the output.\n max_cols : int, optional\n When to switch from the verbose to the truncated output. If the\n DataFrame has more than `max_cols` columns, the truncated output\n is used. By default, the setting in\n ``pandas.options.display.max_info_columns`` is used.\n memory_usage : bool, str, optional\n Specifies whether total memory usage of the DataFrame\n elements (including the index) should be displayed. By default,\n this follows the ``pandas.options.display.memory_usage`` setting.\n\n True always show memory usage. False never shows memory usage.\n A value of 'deep' is equivalent to \"True with deep introspection\".\n Memory usage is shown in human-readable units (base-2\n representation). Without deep introspection a memory estimation is\n made based in column dtype and number of rows assuming values\n consume the same memory amount for corresponding dtypes. With deep\n memory introspection, a real memory usage calculation is performed\n at the cost of computational resources. See the\n :ref:`Frequently Asked Questions ` for more\n details.\n show_counts : bool, optional\n Whether to show the non-null counts. By default, this is shown\n only if the DataFrame is smaller than\n ``pandas.options.display.max_info_rows`` and\n ``pandas.options.display.max_info_columns``. A value of True always\n shows the counts, and False never shows the counts.\n\n Returns\n -------\n None\n This method prints a summary of a DataFrame and returns None.\n\n See Also\n --------\n DataFrame.describe: Generate descriptive statistics of DataFrame\n columns.\n DataFrame.memory_usage: Memory usage of DataFrame columns.\n\n Examples\n --------\n >>> int_values = [1, 2, 3, 4, 5]\n >>> text_values = [\"alpha\", \"beta\", \"gamma\", \"delta\", \"epsilon\"]\n >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]\n >>> df = pd.DataFrame(\n ... {\n ... \"int_col\": int_values,\n ... \"text_col\": text_values,\n ... \"float_col\": float_values,\n ... }\n ... )\n >>> df\n int_col text_col float_col\n 0 1 alpha 0.00\n 1 2 beta 0.25\n 2 3 gamma 0.50\n 3 4 delta 0.75\n 4 5 epsilon 1.00\n\n Prints information of all columns:\n\n >>> df.info(verbose=True)\n \n RangeIndex: 5 entries, 0 to 4\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 int_col 5 non-null int64\n 1 text_col 5 non-null str\n 2 float_col 5 non-null float64\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Prints a summary of columns count and its dtypes but not per column\n information:\n\n >>> df.info(verbose=False)\n \n RangeIndex: 5 entries, 0 to 4\n Columns: 3 entries, int_col to float_col\n dtypes: float64(1), int64(1), str(1)\n memory usage: 278.0 bytes\n\n Pipe output of DataFrame.info to buffer instead of sys.stdout, get\n buffer content and writes to a text file:\n\n >>> import io\n >>> buffer = io.StringIO()\n >>> df.info(buf=buffer)\n >>> s = buffer.getvalue()\n >>> with open(\"df_info.txt\", \"w\", encoding=\"utf-8\") as f: # doctest: +SKIP\n ... f.write(s)\n 260\n\n The `memory_usage` parameter allows deep introspection mode, specially\n useful for big DataFrames and fine-tune memory optimization:\n\n >>> random_strings_array = np.random.choice([\"a\", \"b\", \"c\"], 10**6)\n >>> df = pd.DataFrame(\n ... {\n ... \"column_1\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_2\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... \"column_3\": np.random.choice([\"a\", \"b\", \"c\"], 10**6),\n ... }\n ... )\n >>> df.info()\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n\n >>> df.info(memory_usage=\"deep\")\n \n RangeIndex: 1000000 entries, 0 to 999999\n Data columns (total 3 columns):\n # Column Non-Null Count Dtype\n --- ------ -------------- -----\n 0 column_1 1000000 non-null str\n 1 column_2 1000000 non-null str\n 2 column_3 1000000 non-null str\n dtypes: str(3)\n memory usage: 25.7 MB\n \"\"\"\n info = DataFrameInfo(\n data=self,\n memory_usage=memory_usage,\n )\n info.render(\n buf=buf,\n max_cols=max_cols,\n verbose=verbose,\n show_counts=show_counts,\n )\n\n def memory_usage(self, index: bool = True, deep: bool = False) -> Series:\n \"\"\"\n Return the memory usage of each column in bytes.\n\n The memory usage can optionally include the contribution of\n the index and elements of `object` dtype.\n\n This value is displayed in `DataFrame.info` by default. This can be\n suppressed by setting ``pandas.options.display.memory_usage`` to False.\n\n Parameters\n ----------\n index : bool, default True\n Specifies whether to include the memory usage of the DataFrame's\n index in returned Series. If ``index=True``, the memory usage of\n the index is the first item in the output.\n deep : bool, default False\n If True, introspect the data deeply by interrogating\n `object` dtypes for system-level memory consumption, and include\n it in the returned values.\n\n Returns\n -------\n Series\n A Series whose index is the original column names and whose values\n is the memory usage of each column in bytes.\n\n See Also\n --------\n numpy.ndarray.nbytes : Total bytes consumed by the elements of an\n ndarray.\n Series.memory_usage : Bytes consumed by a Series.\n Categorical : Memory-efficient array for string values with\n many repeated values.\n DataFrame.info : Concise summary of a DataFrame.\n\n Notes\n -----\n See the :ref:`Frequently Asked Questions ` for more\n details.\n\n Examples\n --------\n >>> dtypes = [\"int64\", \"float64\", \"complex128\", \"object\", \"bool\"]\n >>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])\n >>> df = pd.DataFrame(data)\n >>> df.head()\n int64 float64 complex128 object bool\n 0 1 1.0 1.0+0.0j 1 True\n 1 1 1.0 1.0+0.0j 1 True\n 2 1 1.0 1.0+0.0j 1 True\n 3 1 1.0 1.0+0.0j 1 True\n 4 1 1.0 1.0+0.0j 1 True\n\n >>> df.memory_usage()\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n >>> df.memory_usage(index=False)\n int64 40000\n float64 40000\n complex128 80000\n object 40000\n bool 5000\n dtype: int64\n\n The memory footprint of `object` dtype columns is ignored by default:\n\n >>> df.memory_usage(deep=True)\n Index 132\n int64 40000\n float64 40000\n complex128 80000\n object 180000\n bool 5000\n dtype: int64\n\n Use a Categorical for efficient storage of an object-dtype column with\n many repeated values.\n\n >>> df[\"object\"].astype(\"category\").memory_usage(deep=True)\n 5140\n \"\"\"\n result = self._constructor_sliced(\n [c.memory_usage(index=False, deep=deep) for col, c in self.items()],\n index=self.columns,\n dtype=np.intp,\n )\n if index:\n index_memory_usage = self._constructor_sliced(\n self.index.memory_usage(deep=deep), index=[\"Index\"]\n )\n result = index_memory_usage._append_internal(result)\n return result\n\n def transpose(\n self,\n *args,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Transpose index and columns.\n\n Reflect the DataFrame over its main diagonal by writing rows as columns\n and vice-versa. The property :attr:`.T` is an accessor to the method\n :meth:`transpose`.\n\n Parameters\n ----------\n *args : tuple, optional\n Accepted for compatibility with NumPy.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n Note that a copy is always required for mixed dtype DataFrames,\n or for DataFrames with any extension types.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n numpy.transpose : Permute the dimensions of a given array.\n\n Notes\n -----\n Transposing a DataFrame with mixed dtypes will result in a homogeneous\n DataFrame with the `object` dtype. In such a case, a copy of the data\n is always made.\n\n Examples\n --------\n **Square DataFrame with homogeneous dtype**\n\n >>> d1 = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df1 = pd.DataFrame(data=d1)\n >>> df1\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df1_transposed = df1.T # or df1.transpose()\n >>> df1_transposed\n 0 1\n col1 1 2\n col2 3 4\n\n When the dtype is homogeneous in the original DataFrame, we get a\n transposed DataFrame with the same dtype:\n\n >>> df1.dtypes\n col1 int64\n col2 int64\n dtype: object\n >>> df1_transposed.dtypes\n 0 int64\n 1 int64\n dtype: object\n\n **Non-square DataFrame with mixed dtypes**\n\n >>> d2 = {\n ... \"name\": [\"Alice\", \"Bob\"],\n ... \"score\": [9.5, 8],\n ... \"employed\": [False, True],\n ... \"kids\": [0, 0],\n ... }\n >>> df2 = pd.DataFrame(data=d2)\n >>> df2\n name score employed kids\n 0 Alice 9.5 False 0\n 1 Bob 8.0 True 0\n\n >>> df2_transposed = df2.T # or df2.transpose()\n >>> df2_transposed\n 0 1\n name Alice Bob\n score 9.5 8.0\n employed False True\n kids 0 0\n\n When the DataFrame has mixed dtypes, we get a transposed DataFrame with\n the `object` dtype:\n\n >>> df2.dtypes\n name str\n score float64\n employed bool\n kids int64\n dtype: object\n >>> df2_transposed.dtypes\n 0 object\n 1 object\n dtype: object\n \"\"\"\n self._check_copy_deprecation(copy)\n nv.validate_transpose(args, {})\n # construct the args\n\n first_dtype = self.dtypes.iloc[0] if len(self.columns) else None\n\n if self._can_fast_transpose:\n # Note: tests pass without this, but this improves perf quite a bit.\n new_vals = self._values.T\n\n result = self._constructor(\n new_vals,\n index=self.columns,\n columns=self.index,\n copy=False,\n dtype=new_vals.dtype,\n )\n if len(self) > 0:\n result._mgr.add_references(self._mgr)\n\n elif (\n self._is_homogeneous_type\n and first_dtype is not None\n and isinstance(first_dtype, ExtensionDtype)\n ):\n new_values: list\n if isinstance(first_dtype, BaseMaskedDtype):\n # We have masked arrays with the same dtype. We can transpose faster.\n from pandas.core.arrays.masked import (\n transpose_homogeneous_masked_arrays,\n )\n\n new_values = transpose_homogeneous_masked_arrays(\n cast(\"Sequence[BaseMaskedArray]\", self._iter_column_arrays())\n )\n elif isinstance(first_dtype, ArrowDtype):\n # We have arrow EAs with the same dtype. We can transpose faster.\n from pandas.core.arrays.arrow.array import (\n ArrowExtensionArray,\n transpose_homogeneous_pyarrow,\n )\n\n new_values = transpose_homogeneous_pyarrow(\n cast(\"Sequence[ArrowExtensionArray]\", self._iter_column_arrays())\n )\n else:\n # We have other EAs with the same dtype. We preserve dtype in transpose.\n arr_typ = first_dtype.construct_array_type()\n values = self.values\n new_values = [\n arr_typ._from_sequence(row, dtype=first_dtype) for row in values\n ]\n\n result = type(self)._from_arrays(\n new_values,\n index=self.columns,\n columns=self.index,\n verify_integrity=False,\n )\n\n else:\n new_arr = self.values.T\n result = self._constructor(\n new_arr,\n index=self.columns,\n columns=self.index,\n dtype=new_arr.dtype,\n # We already made a copy (more than one block)\n copy=False,\n )\n\n return result.__finalize__(self, method=\"transpose\")\n\n @property\n def T(self) -> DataFrame:\n \"\"\"\n The transpose of the DataFrame.\n\n This property returns a DataFrame with rows and columns interchanged,\n reflecting the data across the main diagonal.\n\n Returns\n -------\n DataFrame\n The transposed DataFrame.\n\n See Also\n --------\n DataFrame.transpose : Transpose index and columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n >>> df.T\n 0 1\n col1 1 2\n col2 3 4\n \"\"\"\n return self.transpose()\n\n # ----------------------------------------------------------------------\n # Indexing Methods\n\n def _ixs(self, i: int, axis: AxisInt = 0) -> Series:\n \"\"\"\n Parameters\n ----------\n i : int\n axis : int\n\n Returns\n -------\n Series\n \"\"\"\n if axis == 0:\n mgr = self._mgr.fast_xs(i)\n name = self.index[i]\n else:\n mgr = self._mgr.iget(i)\n # Lookup in columns so that if e.g. a str datetime was passed\n # we attach the Timestamp object as the name.\n name = self.columns[i]\n result = self._constructor_sliced_from_mgr(mgr, axes=mgr.axes)\n object.__setattr__(result, \"_name\", name)\n return result.__finalize__(self)\n\n def _get_column_array(self, i: int) -> ArrayLike:\n \"\"\"\n Get the values of the i'th column (ndarray or ExtensionArray, as stored\n in the Block)\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n return self._mgr.iget_values(i)\n\n def _iter_column_arrays(self) -> Iterator[ArrayLike]:\n \"\"\"\n Iterate over the arrays of all columns in order.\n This returns the values as stored in the Block (ndarray or ExtensionArray).\n\n Warning! The returned array is a view but doesn't handle Copy-on-Write,\n so this should be used with caution (for read-only purposes).\n \"\"\"\n for i in range(len(self.columns)):\n yield self._get_column_array(i)\n\n def __getitem__(self, key):\n check_dict_or_set_indexers(key)\n key = lib.item_from_zerodim(key)\n key = com.apply_if_callable(key, self)\n\n if is_hashable(key, allow_slice=False) and not is_iterator(key):\n # is_iterator to exclude generator e.g. test_getitem_listlike\n # As of Python 3.12, slice is hashable which breaks MultiIndex (GH#57500)\n\n # Shortcut: return single column as Series when key refers to one column.\n # Previously we used \"key in self.columns.drop_duplicates(keep=False)\",\n # which built a new Index on every access when columns had duplicates.\n # Using get_loc(key) instead: it returns int iff key appears exactly once,\n # so we get the same behavior without extra allocation (GH#45316).\n is_mi = isinstance(self.columns, MultiIndex)\n if not is_mi:\n try:\n loc = self.columns.get_loc(key)\n except (KeyError, InvalidIndexError):\n # Key missing or invalid; fall through to list/slice/other paths.\n pass\n else:\n # int: key unique; slice/array: key duplicated (fall through).\n if isinstance(loc, int):\n return self._get_item(key)\n elif is_mi and self.columns.is_unique and key in self.columns:\n return self._getitem_multilevel(key)\n\n # Do we have a slicer (on rows)?\n if isinstance(key, slice):\n return self._getitem_slice(key)\n\n # Do we have a (boolean) DataFrame?\n if isinstance(key, DataFrame):\n return self.where(key)\n\n # Do we have a (boolean) 1d indexer?\n if com.is_bool_indexer(key):\n return self._getitem_bool_array(key)\n\n # We are left with two options: a single key, and a collection of keys,\n # We interpret tuples as collections only for non-MultiIndex\n is_single_key = isinstance(key, tuple) or not is_list_like(key)\n\n if is_single_key:\n if self.columns.nlevels > 1:\n return self._getitem_multilevel(key)\n indexer = self.columns.get_loc(key)\n if is_integer(indexer):\n indexer = [indexer] # type: ignore[assignment]\n else:\n if is_iterator(key):\n key = list(key)\n indexer = self.columns._get_indexer_strict(key, \"columns\")[1]\n\n # take() does not accept boolean indexers\n if getattr(indexer, \"dtype\", None) == bool:\n indexer = np.where(indexer)[0] # type: ignore[arg-type, assignment]\n\n if isinstance(indexer, slice):\n return self._slice(indexer, axis=1)\n\n data = self.take(indexer, axis=1)\n\n if is_single_key:\n # What does looking for a single key in a non-unique index return?\n # The behavior is inconsistent. It returns a Series, except when\n # - the key itself is repeated (test on data.shape, #9519), or\n # - we have a MultiIndex on columns (test on self.columns, #21309)\n if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):\n # GH#26490 using data[key] can cause RecursionError\n return data._get_item(key)\n\n return data\n\n def _getitem_bool_array(self, key):\n # also raises Exception if object array with NA values\n # warning here just in case -- previously __setitem__ was\n # reindexing but __getitem__ was not; it seems more reasonable to\n # go with the __setitem__ behavior since that is more consistent\n # with all other indexing behavior\n if isinstance(key, Series) and not key.index.equals(self.index):\n warnings.warn(\n \"Boolean Series key will be reindexed to match DataFrame index.\",\n UserWarning,\n stacklevel=find_stack_level(),\n )\n elif len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}.\"\n )\n\n # check_bool_indexer will throw exception if Series key cannot\n # be reindexed to match DataFrame rows\n key = check_bool_indexer(self.index, key)\n\n if key.all():\n return self.copy(deep=False)\n\n indexer = key.nonzero()[0]\n return self.take(indexer, axis=0)\n\n def _getitem_multilevel(self, key):\n # self.columns is a MultiIndex\n assert isinstance(self.columns, MultiIndex)\n if isinstance(key, tuple) and any(\n isinstance(k, (slice, list, np.ndarray)) for k in key\n ):\n # Tuple key contains slices or lists, e.g. df[:, \"t1\"] which gives\n # key=(slice(None), \"t1\"), or df[[\"A\", \"B\"], \"t1\"] which gives\n # key=([\"A\", \"B\"], \"t1\"). Use get_locs which handles\n # per-level slicing and list selection (GH#26511)\n loc = self.columns.get_locs(key)\n new_columns = self.columns[loc]\n # Drop levels where a specific label was given (not slices/lists),\n # consistent with how df[\"A\"] drops the level used for selection\n levels_to_drop = [\n idx\n for idx, k in enumerate(key)\n if not isinstance(k, (slice, list, np.ndarray))\n ]\n if levels_to_drop:\n new_columns = new_columns.droplevel(levels_to_drop)\n result = self.iloc[:, loc]\n result.columns = new_columns\n return result\n\n loc = self.columns.get_loc(key)\n if isinstance(loc, (slice, np.ndarray)):\n new_columns = self.columns[loc]\n result_columns = maybe_droplevels(new_columns, key)\n result = self.iloc[:, loc]\n result.columns = result_columns\n\n # If there is only one column being returned, and its name is\n # either an empty string, or a tuple with an empty string as its\n # first element, then treat the empty string as a placeholder\n # and return the column as if the user had provided that empty\n # string in the key. If the result is a Series, exclude the\n # implied empty string from its name.\n if len(result.columns) == 1:\n # e.g. test_frame_getitem_multicolumn_empty_level,\n # test_frame_mixed_depth_get, test_loc_setitem_single_column_slice\n top = result.columns[0]\n if isinstance(top, tuple):\n top = top[0]\n if top == \"\":\n result = result[\"\"]\n if isinstance(result, Series):\n result = self._constructor_sliced(\n result, index=self.index, name=key\n )\n\n return result\n else:\n # loc is neither a slice nor ndarray, so must be an int\n return self._ixs(loc, axis=1)\n\n def _get_value(self, index, col, takeable: bool = False) -> Scalar:\n \"\"\"\n Quickly retrieve single value at passed column and index.\n\n Parameters\n ----------\n index : row label\n col : column label\n takeable : interpret the index/col as indexers, default False\n\n Returns\n -------\n scalar\n\n Notes\n -----\n Assumes that both `self.index._index_as_unique` and\n `self.columns._index_as_unique`; Caller is responsible for checking.\n \"\"\"\n if takeable:\n series = self._ixs(col, axis=1)\n return series._values[index]\n\n series = self._get_item(col)\n\n if not isinstance(self.index, MultiIndex):\n # CategoricalIndex: Trying to use the engine fastpath may give incorrect\n # results if our categories are integers that dont match our codes\n # IntervalIndex: IntervalTree has no get_loc\n row = self.index.get_loc(index)\n return series._values[row]\n\n # For MultiIndex going through engine effectively restricts us to\n # same-length tuples; see test_get_set_value_no_partial_indexing\n try:\n loc = self.index._engine.get_loc(index)\n except TypeError:\n # e.g. partial string slicing on DatetimeIndex level;\n # see GH#43395\n loc = self.index.get_loc(index)\n return series._values[loc]\n\n def isetitem(self, loc, value) -> None:\n \"\"\"\n Set the given value in the column with position `loc`.\n\n This is a positional analogue to ``__setitem__``.\n\n Parameters\n ----------\n loc : int or sequence of ints\n Index position for the column.\n value : scalar or arraylike\n Value(s) for the column.\n\n See Also\n --------\n DataFrame.iloc : Purely integer-location based indexing for selection by\n position.\n\n Notes\n -----\n ``frame.isetitem(loc, value)`` is an in-place method as it will\n modify the DataFrame in place (not returning a new object). In contrast to\n ``frame.iloc[:, i] = value`` which will try to update the existing values in\n place, ``frame.isetitem(loc, value)`` will not update the values of the column\n itself in place, it will instead insert a new array.\n\n In cases where ``frame.columns`` is unique, this is equivalent to\n ``frame[frame.columns[i]] = value``.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2], \"B\": [3, 4]})\n >>> df.isetitem(1, [5, 6])\n >>> df\n A B\n 0 1 5\n 1 2 6\n \"\"\"\n if isinstance(value, DataFrame):\n if is_integer(loc):\n loc = [loc]\n\n if len(loc) != len(value.columns):\n raise ValueError(\n f\"Got {len(loc)} positions but value has {len(value.columns)} \"\n f\"columns.\"\n )\n\n for i, idx in enumerate(loc):\n arraylike, refs = self._sanitize_column(value.iloc[:, i])\n self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs)\n return\n\n arraylike, refs = self._sanitize_column(value)\n self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)\n\n def __setitem__(self, key, value) -> None:\n \"\"\"\n Set item(s) in DataFrame by key.\n\n This method allows you to set the values of one or more columns in the\n DataFrame using a key. If the key does not exist, a new\n column will be created.\n\n Parameters\n ----------\n key : The object(s) in the index which are to be assigned to\n Column label(s) to set. Can be a single column name, list of column names,\n or tuple for MultiIndex columns.\n value : scalar, array-like, Series, or DataFrame\n Value(s) to set for the specified key(s).\n\n Returns\n -------\n None\n This method does not return a value.\n\n See Also\n --------\n DataFrame.loc : Access and set values by label-based indexing.\n DataFrame.iloc : Access and set values by position-based indexing.\n DataFrame.assign : Assign new columns to a DataFrame.\n\n Notes\n -----\n When assigning a Series to a DataFrame column, pandas aligns the Series\n by index labels, not by position. In effect, the Series is reindexed to\n the DataFrame's index before assignment. This means:\n\n * Values from the Series are matched to DataFrame rows by index label\n * If a Series index label doesn't exist in the DataFrame index, it's ignored\n * If a DataFrame index label doesn't exist in the Series index, NaN is assigned\n * The order of values in the Series doesn't matter; only the index labels matter\n\n Examples\n --------\n Basic column assignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]})\n >>> df[\"B\"] = [4, 5, 6] # Assigns by position\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n Series assignment with index alignment:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[0, 1, 2])\n >>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df\n >>> df[\"B\"] = s # Assigns by index label, not position\n >>> df\n A B\n 0 1 NaN\n 1 2 10.0\n 2 3 NaN\n\n Series assignment with partial index match:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3, 4]}, index=[\"a\", \"b\", \"c\", \"d\"])\n >>> s = pd.Series([100, 200], index=[\"b\", \"d\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n a 1 NaN\n b 2 100.0\n c 3 NaN\n d 4 200.0\n\n Series index labels NOT in DataFrame, ignored:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3]}, index=[\"x\", \"y\", \"z\"])\n >>> s = pd.Series([10, 20, 30, 40, 50], index=[\"x\", \"y\", \"a\", \"b\", \"z\"])\n >>> df[\"B\"] = s\n >>> df\n A B\n x 1 10\n y 2 20\n z 3 50\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(self) <= REF_COUNT and not com.is_local_in_caller_frame(\n self\n ):\n warnings.warn(\n _chained_assignment_msg, ChainedAssignmentError, stacklevel=2\n )\n\n key = com.apply_if_callable(key, self)\n\n # see if we can slice the rows\n if isinstance(key, slice):\n slc = self.index._convert_slice_indexer(key, kind=\"getitem\")\n return self._setitem_slice(slc, value)\n\n if isinstance(key, DataFrame) or getattr(key, \"ndim\", None) == 2:\n self._setitem_frame(key, value)\n elif isinstance(key, (Series, np.ndarray, list, Index)):\n self._setitem_array(key, value)\n elif isinstance(value, DataFrame):\n self._set_item_frame_value(key, value)\n elif (\n is_list_like(value)\n and not self.columns.is_unique\n and 1 < len(self.columns.get_indexer_for([key])) == len(value)\n ):\n # Column to set is duplicated\n self._setitem_array([key], value)\n else:\n # set column\n self._set_item(key, value)\n\n def _setitem_slice(self, key: slice, value) -> None:\n # NB: we can't just use self.loc[key] = value because that\n # operates on labels and we need to operate positional for\n # backwards-compat, xref GH#31469\n self.iloc[key] = value\n\n def _setitem_array(self, key, value) -> None:\n # also raises Exception if object array with NA values\n if com.is_bool_indexer(key):\n # bool indexer is indexing along rows\n if len(key) != len(self.index):\n raise ValueError(\n f\"Item wrong length {len(key)} instead of {len(self.index)}!\"\n )\n key = check_bool_indexer(self.index, key)\n indexer = key.nonzero()[0]\n if isinstance(value, DataFrame):\n # GH#39931 reindex since iloc does not align\n value = value.reindex(self.index.take(indexer))\n self.iloc[indexer] = value\n\n # Note: unlike self.iloc[:, indexer] = value, this will\n # never try to overwrite values inplace\n\n elif isinstance(value, DataFrame):\n check_key_length(self.columns, key, value)\n for k1, k2 in zip(key, value.columns, strict=False):\n self[k1] = value[k2]\n\n elif not is_list_like(value):\n for col in key:\n self[col] = value\n\n elif isinstance(value, np.ndarray) and value.ndim == 2:\n self._iset_not_inplace(key, value)\n\n elif np.ndim(value) > 1:\n # list of lists\n value = DataFrame(value).values\n self._setitem_array(key, value)\n\n else:\n self._iset_not_inplace(key, value)\n\n def _iset_not_inplace(self, key, value) -> None:\n # GH#39510 when setting with df[key] = obj with a list-like key and\n # list-like value, we iterate over those listlikes and set columns\n # one at a time. This is different from dispatching to\n # `self.loc[:, key]= value` because loc.__setitem__ may overwrite\n # data inplace, whereas this will insert new arrays.\n\n def igetitem(obj, i: int):\n # Note: we catch DataFrame obj before getting here, but\n # hypothetically would return obj.iloc[:, i]\n if isinstance(obj, np.ndarray):\n return obj[..., i]\n else:\n return obj[i]\n\n if self.columns.is_unique:\n if np.shape(value)[-1] != len(key):\n raise ValueError(\"Columns must be same length as key\")\n\n for i, col in enumerate(key):\n self[col] = igetitem(value, i)\n\n else:\n ilocs = self.columns.get_indexer_non_unique(key)[0]\n if (ilocs < 0).any():\n # key entries not in self.columns\n raise NotImplementedError\n\n if np.shape(value)[-1] != len(ilocs):\n raise ValueError(\"Columns must be same length as key\")\n\n assert np.ndim(value) <= 2\n\n orig_columns = self.columns\n\n # Using self.iloc[:, i] = ... may set values inplace, which\n # by convention we do not do in __setitem__\n try:\n self.columns = Index(range(len(self.columns)))\n for i, iloc in enumerate(ilocs):\n self[iloc] = igetitem(value, i)\n finally:\n self.columns = orig_columns\n\n def _setitem_frame(self, key, value) -> None:\n # support boolean setting with DataFrame input, e.g.\n # df[df > df2] = 0\n if isinstance(key, np.ndarray):\n if key.shape != self.shape:\n raise ValueError(\"Array conditional must be same shape as self\")\n key = self._constructor(key, **self._construct_axes_dict(), copy=False)\n\n if key.size and not all(is_bool_dtype(blk.dtype) for blk in key._mgr.blocks):\n raise TypeError(\n \"Must pass DataFrame or 2-d ndarray with boolean values only\"\n )\n\n self._where(-key, value, inplace=True)\n\n def _set_item_frame_value(self, key, value: DataFrame) -> None:\n self._ensure_valid_index(value)\n\n # align columns\n if key in self.columns:\n loc = self.columns.get_loc(key)\n cols = self.columns[loc]\n len_cols = 1 if is_scalar(cols) or isinstance(cols, tuple) else len(cols)\n if len_cols != len(value.columns):\n raise ValueError(\"Columns must be same length as key\")\n\n # align right-hand-side columns if self.columns\n # is multi-index and self[key] is a sub-frame\n if isinstance(self.columns, MultiIndex) and isinstance(\n loc, (slice, Series, np.ndarray, Index)\n ):\n cols_droplevel = maybe_droplevels(cols, key)\n if (\n not isinstance(cols_droplevel, MultiIndex)\n and is_string_dtype(cols_droplevel.dtype)\n and not cols_droplevel.any()\n ):\n # if cols_droplevel contains only empty strings,\n # value.reindex(cols_droplevel, axis=1) would be full of NaNs\n # see GH#62518 and GH#61841\n return\n if len(cols_droplevel) and not cols_droplevel.equals(value.columns):\n value = value.reindex(cols_droplevel, axis=1)\n\n if not cols_droplevel.equals(cols):\n # Levels were actually dropped, so we can safely use\n # key-based indexing without re-entering this method.\n for col, col_droplevel in zip(cols, cols_droplevel, strict=True):\n self[col] = value[col_droplevel]\n return\n # If cols_droplevel == cols (key matched all levels),\n # fall through to positional isetitem to avoid\n # infinite recursion (GH#53498).\n\n if is_scalar(cols):\n self[cols] = value[value.columns[0]]\n return\n\n locs: np.ndarray | list\n if isinstance(loc, slice):\n locs = np.arange(loc.start, loc.stop, loc.step)\n elif is_scalar(loc):\n locs = [loc]\n else:\n locs = loc.nonzero()[0] # type: ignore[union-attr]\n\n return self.isetitem(locs, value)\n\n if len(value.columns) > 1:\n raise ValueError(\n \"Cannot set a DataFrame with multiple columns to the single \"\n f\"column {key}\"\n )\n elif len(value.columns) == 0:\n raise ValueError(\n f\"Cannot set a DataFrame without columns to the column {key}\"\n )\n\n self[key] = value[value.columns[0]]\n\n def _iset_item_mgr(\n self,\n loc: int | slice | np.ndarray,\n value,\n inplace: bool = False,\n refs: BlockValuesRefs | None = None,\n ) -> None:\n # when called from _set_item_mgr loc can be anything returned from get_loc\n self._mgr.iset(loc, value, inplace=inplace, refs=refs)\n\n def _set_item_mgr(\n self, key, value: ArrayLike, refs: BlockValuesRefs | None = None\n ) -> None:\n try:\n loc = self._info_axis.get_loc(key)\n except KeyError:\n # This item wasn't present, just insert at end\n self._mgr.insert(len(self._info_axis), key, value, refs)\n else:\n self._iset_item_mgr(loc, value, refs=refs)\n\n def _iset_item(self, loc: int, value: Series, inplace: bool = True) -> None:\n # We are only called from _replace_columnwise which guarantees that\n # no reindex is necessary\n self._iset_item_mgr(loc, value._values, inplace=inplace, refs=value._references)\n\n def _set_item(self, key, value) -> None:\n \"\"\"\n Add series to DataFrame in specified column.\n\n If series is a numpy-array (not a Series/TimeSeries), it must be the\n same length as the DataFrames index or an error will be thrown.\n\n Series/TimeSeries will be conformed to the DataFrames index to\n ensure homogeneity.\n \"\"\"\n value, refs = self._sanitize_column(value)\n\n if (\n key in self.columns\n and value.ndim == 1\n and not isinstance(value.dtype, ExtensionDtype)\n ):\n # broadcast across multiple columns if necessary\n if not self.columns.is_unique or isinstance(self.columns, MultiIndex):\n existing_piece = self[key]\n if isinstance(existing_piece, DataFrame):\n value = np.tile(value, (len(existing_piece.columns), 1)).T\n refs = None\n\n self._set_item_mgr(key, value, refs)\n\n def _set_value(\n self, index: IndexLabel, col, value: Scalar, takeable: bool = False\n ) -> None:\n \"\"\"\n Put single value at passed column and index.\n\n Parameters\n ----------\n index : Label\n row label\n col : Label\n column label\n value : scalar\n takeable : bool, default False\n Sets whether or not index/col interpreted as indexers\n \"\"\"\n try:\n if takeable:\n icol = col\n iindex = cast(\"int\", index)\n else:\n icol = self.columns.get_loc(col)\n iindex = self.index.get_loc(index) # type: ignore[assignment]\n self._mgr.column_setitem(icol, iindex, value, inplace_only=True)\n\n except (KeyError, TypeError, ValueError, LossySetitemError):\n # get_loc might raise a KeyError for missing labels (falling back\n # to (i)loc will do expansion of the index)\n # column_setitem will do validation that may raise TypeError,\n # ValueError, or LossySetitemError\n # set using a non-recursive method & reset the cache\n if takeable:\n self.iloc[index, col] = value\n else:\n self.loc[index, col] = value\n\n except InvalidIndexError as ii_err:\n # GH48729: Seems like you are trying to assign a value to a\n # row when only scalar options are permitted\n raise InvalidIndexError(\n f\"You can only assign a scalar value not a {type(value)}\"\n ) from ii_err\n\n def _ensure_valid_index(self, value) -> None:\n \"\"\"\n Ensure that if we don't have an index, that we can create one from the\n passed value.\n \"\"\"\n # GH5632, make sure that we are a Series convertible\n if not len(self.index) and is_list_like(value) and len(value):\n if not isinstance(value, DataFrame):\n try:\n value = Series(value)\n except (ValueError, NotImplementedError, TypeError) as err:\n raise ValueError(\n \"Cannot set a frame with no defined index \"\n \"and a value that cannot be converted to a Series\"\n ) from err\n\n # GH31368 preserve name of index\n index_copy = value.index.copy()\n if self.index.name is not None:\n index_copy.name = self.index.name\n\n self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)\n\n def _get_item(self, item: Hashable) -> Series:\n loc = self.columns.get_loc(item)\n return self._ixs(loc, axis=1) # type: ignore[arg-type]\n\n # ----------------------------------------------------------------------\n # Unsorted\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[False] = ...,\n ) -> DataFrame: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: Literal[True],\n ) -> None: ...\n\n @overload\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = ...,\n engine: Literal[\"python\", \"numexpr\"] | None = ...,\n local_dict: dict[str, Any] | None = ...,\n global_dict: dict[str, Any] | None = ...,\n resolvers: list[Mapping] | None = ...,\n level: int = ...,\n inplace: bool = ...,\n ) -> DataFrame | None: ...\n\n def query(\n self,\n expr: str,\n *,\n parser: Literal[\"pandas\", \"python\"] = \"pandas\",\n engine: Literal[\"python\", \"numexpr\"] | None = None,\n local_dict: dict[str, Any] | None = None,\n global_dict: dict[str, Any] | None = None,\n resolvers: list[Mapping] | None = None,\n level: int = 0,\n inplace: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Query the columns of a DataFrame with a boolean expression.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The query string to evaluate.\n\n See the documentation for :func:`eval` for details of\n supported operations and functions in the query string.\n\n See the documentation for :meth:`DataFrame.eval` for details on\n referring to column names and variables in the query string.\n parser : {'pandas', 'python'}, default 'pandas'\n The parser to use to construct the syntax tree from the expression. The\n default of ``'pandas'`` parses code slightly different than standard\n Python. Alternatively, you can parse an expression using the\n ``'python'`` parser to retain strict Python semantics. See the\n :ref:`enhancing performance ` documentation for\n more details.\n engine : {'python', 'numexpr'}, default 'numexpr'\n\n The engine used to evaluate the expression. Supported engines are\n\n - None : tries to use ``numexpr``, falls back to ``python``\n - ``'numexpr'`` : This default engine evaluates pandas objects using\n numexpr for large speed ups in complex expressions with large frames.\n - ``'python'`` : Performs operations as if you had ``eval``'d in top\n level python. This engine is generally not that useful.\n\n More backends may be available in the future.\n local_dict : dict or None, optional\n A dictionary of local variables, taken from locals() by default.\n global_dict : dict or None, optional\n A dictionary of global variables, taken from globals() by default.\n resolvers : list of dict-like or None, optional\n A list of objects implementing the ``__getitem__`` special method that\n you can use to inject an additional collection of namespaces to use for\n variable lookup. For example, this is used in the\n :meth:`~DataFrame.query` method to inject the\n ``DataFrame.index`` and ``DataFrame.columns``\n variables that refer to their respective :class:`~pandas.DataFrame`\n instance attributes.\n level : int, optional\n The number of prior stack frames to traverse and add to the current\n scope. Most users will **not** need to change this parameter.\n inplace : bool\n Whether to modify the DataFrame rather than creating a new one.\n\n Returns\n -------\n DataFrame or None\n DataFrame resulting from the provided query expression or\n None if ``inplace=True``.\n\n See Also\n --------\n eval : Evaluate a string describing operations on\n DataFrame columns.\n DataFrame.eval : Evaluate a string describing operations on\n DataFrame columns.\n\n Notes\n -----\n The result of the evaluation of this expression is first passed to\n :attr:`DataFrame.loc` and if that fails because of a\n multidimensional key (e.g., a DataFrame) then the result will be passed\n to :meth:`DataFrame.__getitem__`.\n\n This method uses the top-level :func:`eval` function to\n evaluate the passed query.\n\n The :meth:`~pandas.DataFrame.query` method uses a slightly\n modified Python syntax by default. For example, the ``&`` and ``|``\n (bitwise) operators have the precedence of their boolean cousins,\n :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,\n however the semantics are different.\n\n You can change the semantics of the expression by passing the keyword\n argument ``parser='python'``. This enforces the same semantics as\n evaluation in Python space. Likewise, you can pass ``engine='python'``\n to evaluate an expression using Python itself as a backend. This is not\n recommended as it is inefficient compared to using ``numexpr`` as the\n engine.\n\n The :attr:`DataFrame.index` and\n :attr:`DataFrame.columns` attributes of the\n :class:`~pandas.DataFrame` instance are placed in the query namespace\n by default, which allows you to treat both the index and columns of the\n frame as a column in the frame.\n The identifier ``index`` is used for the frame index; you can also\n use the name of the index to identify it in a query. Please note that\n Python keywords may not be used as identifiers.\n\n For further details and examples see the ``query`` documentation in\n :ref:`indexing `.\n\n *Backtick quoted variables*\n\n Backtick quoted variables are parsed as literal Python code and\n are converted internally to a Python valid identifier.\n This can lead to the following problems.\n\n During parsing a number of disallowed characters inside the backtick\n quoted string are replaced by strings that are allowed as a Python identifier.\n These characters include all operators in Python, the space character, the\n question mark, the exclamation mark, the dollar sign, and the euro sign.\n\n A backtick can be escaped by double backticks.\n\n See also the `Python documentation about lexical analysis\n `__\n in combination with the source code in :mod:`pandas.core.computation.parsing`.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.query(\"A > B\")\n A B C&C\n 4 5 2 6\n\n The previous expression is equivalent to\n\n >>> df[df.A > df.B]\n A B C&C\n 4 5 2 6\n\n For columns with spaces in their name, you can use backtick quoting.\n\n >>> df.query(\"B == `C&C`\")\n A B C&C\n 0 1 10 10\n\n The previous expression is equivalent to\n\n >>> df[df.B == df[\"C&C\"]]\n A B C&C\n 0 1 10 10\n\n Using local variable:\n\n >>> local_var = 2\n >>> df.query(\"A <= @local_var\")\n A B C&C\n 0 1 10 10\n 1 2 8 9\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if not isinstance(expr, str):\n msg = f\"expr must be a string to be evaluated, {type(expr)} given\"\n raise ValueError(msg)\n\n res = self.eval(\n expr,\n level=level + 1,\n parser=parser,\n target=None,\n engine=engine,\n local_dict=local_dict,\n global_dict=global_dict,\n resolvers=resolvers or (),\n )\n\n try:\n result = self.loc[res]\n except ValueError:\n # when res is multi-dimensional loc raises, but this is sometimes a\n # valid query\n result = self[res]\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[False] = ..., **kwargs) -> Any: ...\n\n @overload\n def eval(self, expr: str, *, inplace: Literal[True], **kwargs) -> None: ...\n\n def eval(self, expr: str, *, inplace: bool = False, **kwargs) -> Any | None:\n \"\"\"\n Evaluate a string describing operations on DataFrame columns.\n\n .. warning::\n\n This method can run arbitrary code which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Operates on columns only, not specific rows or elements. This allows\n `eval` to run arbitrary code, which can make you vulnerable to code\n injection if you pass user input to this function.\n\n Parameters\n ----------\n expr : str\n The expression string to evaluate.\n\n You can refer to variables\n in the environment by prefixing them with an '@' character like\n ``@a + b``.\n\n You can refer to column names that are not valid Python variable names\n by surrounding them in backticks. Thus, column names containing spaces\n or punctuation (besides underscores) or starting with digits must be\n surrounded by backticks. (For example, a column named \"Area (cm^2)\" would\n be referenced as ```Area (cm^2)```). Column names which are Python keywords\n (like \"if\", \"for\", \"import\", etc) cannot be used.\n\n For example, if one of your columns is called ``a a`` and you want\n to sum it with ``b``, your query should be ```a a` + b``.\n\n See the documentation for :func:`eval` for full details of\n supported operations and functions in the expression string.\n inplace : bool, default False\n If the expression contains an assignment, whether to perform the\n operation inplace and mutate the existing DataFrame. Otherwise,\n a new DataFrame is returned.\n **kwargs\n See the documentation for :func:`eval` for complete details\n on the keyword arguments accepted by\n :meth:`~pandas.DataFrame.eval`.\n\n Returns\n -------\n ndarray, scalar, pandas object, or None\n The result of the evaluation or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.query : Evaluates a boolean expression to query the columns\n of a frame.\n DataFrame.assign : Can evaluate an expression or function to create new\n values for a column.\n eval : Evaluate a Python expression as a string using various\n backends.\n\n Notes\n -----\n For more details see the API documentation for :func:`~eval`.\n For detailed examples see :ref:`enhancing performance with eval\n `.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"A\": range(1, 6), \"B\": range(10, 0, -2), \"C&C\": range(10, 5, -1)}\n ... )\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n >>> df.eval(\"A + B\")\n 0 11\n 1 10\n 2 9\n 3 8\n 4 7\n dtype: int64\n\n Assignment is allowed though by default the original DataFrame is not\n modified.\n\n >>> df.eval(\"D = A + B\")\n A B C&C D\n 0 1 10 10 11\n 1 2 8 9 10\n 2 3 6 8 9\n 3 4 4 7 8\n 4 5 2 6 7\n >>> df\n A B C&C\n 0 1 10 10\n 1 2 8 9\n 2 3 6 8\n 3 4 4 7\n 4 5 2 6\n\n Multiple columns can be assigned to using multi-line expressions:\n\n >>> df.eval(\n ... '''\n ... D = A + B\n ... E = A - B\n ... '''\n ... )\n A B C&C D E\n 0 1 10 10 11 -9\n 1 2 8 9 10 -6\n 2 3 6 8 9 -3\n 3 4 4 7 8 0\n 4 5 2 6 7 3\n\n For columns with spaces or other disallowed characters in their name, you can\n use backtick quoting.\n\n >>> df.eval(\"B * `C&C`\")\n 0 100\n 1 72\n 2 48\n 3 28\n 4 12\n dtype: int64\n\n Local variables shall be explicitly referenced using ``@``\n character in front of the name:\n\n >>> local_var = 2\n >>> df.eval(\"@local_var * A\")\n 0 2\n 1 4\n 2 6\n 3 8\n 4 10\n Name: A, dtype: int64\n \"\"\"\n from pandas.core.computation.eval import eval as _eval\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n kwargs[\"level\"] = kwargs.pop(\"level\", 0) + 1\n index_resolvers = self._get_index_resolvers()\n column_resolvers = self._get_cleaned_column_resolvers()\n resolvers = column_resolvers, index_resolvers\n if \"target\" not in kwargs:\n kwargs[\"target\"] = self\n kwargs[\"resolvers\"] = tuple(kwargs.get(\"resolvers\", ())) + resolvers\n\n return _eval(expr, inplace=inplace, **kwargs)\n\n def select_dtypes(self, include=None, exclude=None) -> DataFrame:\n \"\"\"\n Return a subset of the DataFrame's columns based on the column dtypes.\n\n This method allows for filtering columns based on their data types.\n It is useful when working with heterogeneous DataFrames where operations\n need to be performed on a specific subset of data types.\n\n Parameters\n ----------\n include, exclude : scalar or list-like\n A selection of dtypes or strings to be included/excluded. At least\n one of these parameters must be supplied.\n\n Returns\n -------\n DataFrame\n The subset of the frame including the dtypes in ``include`` and\n excluding the dtypes in ``exclude``.\n\n Raises\n ------\n ValueError\n * If both of ``include`` and ``exclude`` are empty\n * If ``include`` and ``exclude`` have overlapping elements\n TypeError\n * If any kind of string dtype is passed in.\n\n See Also\n --------\n DataFrame.dtypes: Return Series with the data type of each column.\n\n Notes\n -----\n * To select all *numeric* types, use ``np.number`` or ``'number'``\n * To select strings you must use the ``object`` dtype, but note that\n this will return *all* object dtype columns. With\n ``pd.options.future.infer_string`` enabled, using ``\"str\"`` will\n work to select all string columns.\n * See the `numpy dtype hierarchy\n `__\n * To select datetimes, use ``np.datetime64``, ``'datetime'`` or\n ``'datetime64'``\n * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or\n ``'timedelta64'``\n * To select Pandas categorical dtypes, use ``'category'``\n * To select Pandas datetimetz dtypes, use ``'datetimetz'``\n or ``'datetime64[ns, tz]'``\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"a\": [1, 2] * 3, \"b\": [True, False] * 3, \"c\": [1.0, 2.0] * 3}\n ... )\n >>> df\n a b c\n 0 1 True 1.0\n 1 2 False 2.0\n 2 1 True 1.0\n 3 2 False 2.0\n 4 1 True 1.0\n 5 2 False 2.0\n\n >>> df.select_dtypes(include=\"bool\")\n b\n 0 True\n 1 False\n 2 True\n 3 False\n 4 True\n 5 False\n\n >>> df.select_dtypes(include=[\"float64\"])\n c\n 0 1.0\n 1 2.0\n 2 1.0\n 3 2.0\n 4 1.0\n 5 2.0\n\n >>> df.select_dtypes(exclude=[\"int64\"])\n b c\n 0 True 1.0\n 1 False 2.0\n 2 True 1.0\n 3 False 2.0\n 4 True 1.0\n 5 False 2.0\n \"\"\"\n if not is_list_like(include):\n include = (include,) if include is not None else ()\n if not is_list_like(exclude):\n exclude = (exclude,) if exclude is not None else ()\n\n selection = (frozenset(include), frozenset(exclude))\n\n if not any(selection):\n raise ValueError(\"at least one of include or exclude must be nonempty\")\n\n # convert the myriad valid dtypes object to a single representation\n def check_int_infer_dtype(dtypes):\n converted_dtypes: list[type] = []\n for dtype in dtypes:\n # Numpy maps int to different types (int32, in64) on Windows and Linux\n # see https://github.com/numpy/numpy/issues/9464\n if (isinstance(dtype, str) and dtype == \"int\") or (dtype is int):\n converted_dtypes.append(np.int32)\n converted_dtypes.append(np.int64)\n elif dtype == \"float\" or dtype is float:\n # GH#42452 : np.dtype(\"float\") coerces to np.float64 from Numpy 1.20\n converted_dtypes.extend([np.float64, np.float32])\n else:\n converted_dtypes.append(infer_dtype_from_object(dtype))\n return frozenset(converted_dtypes)\n\n include = check_int_infer_dtype(include)\n exclude = check_int_infer_dtype(exclude)\n\n for dtypes in (include, exclude):\n invalidate_string_dtypes(dtypes)\n\n # can't both include AND exclude!\n if not include.isdisjoint(exclude):\n raise ValueError(f\"include and exclude overlap on {(include & exclude)}\")\n\n def dtype_predicate(dtype: DtypeObj, dtypes_set) -> bool:\n # GH 46870: BooleanDtype._is_numeric == True but should be excluded\n dtype = dtype if not isinstance(dtype, ArrowDtype) else dtype.numpy_dtype\n return (\n issubclass(dtype.type, tuple(dtypes_set))\n or (\n np.number in dtypes_set\n and getattr(dtype, \"_is_numeric\", False)\n and not is_bool_dtype(dtype)\n )\n # backwards compat for the default `str` dtype being selected by object\n or (\n isinstance(dtype, StringDtype)\n and dtype.na_value is np.nan\n and np.object_ in dtypes_set\n )\n )\n\n def predicate(arr: ArrayLike) -> bool:\n dtype = arr.dtype\n if include:\n if not dtype_predicate(dtype, include):\n return False\n\n if exclude:\n if dtype_predicate(dtype, exclude):\n return False\n\n return True\n\n blk_dtypes = [blk.dtype for blk in self._mgr.blocks]\n if (\n np.object_ in include\n and str not in include\n and str not in exclude\n and any(\n isinstance(dtype, StringDtype) and dtype.na_value is np.nan\n for dtype in blk_dtypes\n )\n ):\n # GH#61916\n warnings.warn(\n \"For backward compatibility, 'str' dtypes are included by \"\n \"select_dtypes when 'object' dtype is specified. \"\n \"This behavior is deprecated and will be removed in a future \"\n \"version. Explicitly pass 'str' to `include` to select them, \"\n \"or to `exclude` to remove them and silence this warning.\\nSee \"\n \"https://pandas.pydata.org/docs/user_guide/migration-3-strings.html\"\n \"#string-migration-select-dtypes for details on how to write code \"\n \"that works with pandas 2 and 3.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n mgr = self._mgr._get_data_subset(predicate).copy(deep=False)\n return self._constructor_from_mgr(mgr, axes=mgr.axes).__finalize__(self)\n\n def _select_dtypes_indices(self, dtype_class) -> np.ndarray:\n \"\"\"\n Return the indices of the columns of a given dtype.\n\n Currently only works given a class, so mostly useful for ExtensionDtypes.\n \"\"\"\n\n def predicate(arr: ArrayLike) -> bool:\n return isinstance(arr.dtype, dtype_class)\n\n return self._mgr._get_data_subset_indices(predicate)\n\n def insert(\n self,\n loc: int,\n column: Hashable,\n value: object,\n allow_duplicates: bool = False,\n ) -> None:\n \"\"\"\n Insert column into DataFrame at specified location.\n\n Raises a ValueError if `column` is already contained in the DataFrame,\n unless `allow_duplicates` is set to True.\n\n Parameters\n ----------\n loc : int\n Insertion index. Must verify 0 <= loc <= len(columns).\n column : str, number, or hashable object\n Label of the inserted column.\n value : Scalar, Series, or array-like\n Content of the inserted column.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n\n See Also\n --------\n Index.insert : Insert new item by index.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"col1\": [1, 2], \"col2\": [3, 4]})\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n >>> df.insert(1, \"newcol\", [99, 99])\n >>> df\n col1 newcol col2\n 0 1 99 3\n 1 2 99 4\n >>> df.insert(0, \"col1\", [100, 100], allow_duplicates=True)\n >>> df\n col1 col1 newcol col2\n 0 100 1 99 3\n 1 100 2 99 4\n\n Notice that pandas uses index alignment in case of `value` from type `Series`:\n\n >>> df.insert(0, \"col0\", pd.Series([5, 6], index=[1, 2]))\n >>> df\n col0 col1 col1 newcol col2\n 0 NaN 100 1 99 3\n 1 5.0 100 2 99 4\n \"\"\"\n if allow_duplicates and not self.flags.allows_duplicate_labels:\n raise ValueError(\n \"Cannot specify 'allow_duplicates=True' when \"\n \"'self.flags.allows_duplicate_labels' is False.\"\n )\n if not allow_duplicates and column in self.columns:\n # Should this be a different kind of error??\n raise ValueError(f\"cannot insert {column}, already exists\")\n if not is_integer(loc):\n raise TypeError(\"loc must be int\")\n # convert non stdlib ints to satisfy typing checks\n loc = int(loc)\n if isinstance(value, DataFrame) and len(value.columns) > 1:\n raise ValueError(\n f\"Expected a one-dimensional object, got a DataFrame with \"\n f\"{len(value.columns)} columns instead.\"\n )\n elif isinstance(value, DataFrame):\n value = value.iloc[:, 0]\n\n value, refs = self._sanitize_column(value)\n self._mgr.insert(loc, column, value, refs=refs)\n\n def assign(self, **kwargs) -> DataFrame:\n r\"\"\"\n Assign new columns to a DataFrame.\n\n Returns a new object with all original columns in addition to new ones.\n Existing columns that are re-assigned will be overwritten.\n\n Parameters\n ----------\n **kwargs : callable, Series, scalar, array-like, or dict\n The column names are keywords. If the values are\n callable, they are computed on the DataFrame and\n assigned to the new columns. The callable must not\n change input DataFrame (though pandas doesn't check it).\n If the values are not callable (e.g. a Series, scalar, array,\n or dict), they are simply assigned. See the Notes section for\n details on alignment and broadcasting.\n\n Returns\n -------\n DataFrame\n A new DataFrame with the new columns in addition to\n all the existing columns.\n\n See Also\n --------\n DataFrame.loc : Select a subset of a DataFrame by labels.\n DataFrame.iloc : Select a subset of a DataFrame by positions.\n\n Notes\n -----\n Assigning multiple columns within the same ``assign`` is possible.\n Later items in '\\*\\*kwargs' may refer to newly created or modified\n columns in 'df'; items are computed and assigned into 'df' in order.\n Non-callable values (Series, arrays, scalars) follow the same\n alignment and broadcasting rules as :meth:`DataFrame.__setitem__`. See\n that method's documentation for details.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"temp_c\": [17.0, 25.0]}, index=[\"Portland\", \"Berkeley\"])\n >>> df\n temp_c\n Portland 17.0\n Berkeley 25.0\n\n Where the value is a callable, evaluated on `df`:\n\n >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n Alternatively, the same behavior can be achieved by directly\n referencing an existing Series or sequence:\n\n >>> df.assign(temp_f=df[\"temp_c\"] * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n or by using :meth:`pandas.col`:\n\n >>> df.assign(temp_f=pd.col(\"temp_c\") * 9 / 5 + 32)\n temp_c temp_f\n Portland 17.0 62.6\n Berkeley 25.0 77.0\n\n You can create multiple columns within the same assign where one\n of the columns depends on another one defined within the same assign:\n\n >>> df.assign(\n ... temp_f=lambda x: x[\"temp_c\"] * 9 / 5 + 32,\n ... temp_k=lambda x: (x[\"temp_f\"] + 459.67) * 5 / 9,\n ... )\n temp_c temp_f temp_k\n Portland 17.0 62.6 290.15\n Berkeley 25.0 77.0 298.15\n\n A dict value is aligned to the DataFrame's index by its keys, and\n index labels not present in the dict are filled with NaN:\n\n >>> df.assign(temp_k={\"Portland\": 290.15, \"Berkeley\": 298.15, \"Seattle\": 285.0})\n temp_c temp_k\n Portland 17.0 290.15\n Berkeley 25.0 298.15\n \"\"\"\n data = self.copy(deep=False)\n\n for k, v in kwargs.items():\n data[k] = com.apply_if_callable(v, data)\n return data\n\n def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:\n \"\"\"\n Ensures new columns (which go into the BlockManager as new blocks) are\n always copied (or a reference is being tracked to them under CoW)\n and converted into an array.\n\n Parameters\n ----------\n value : scalar, Series, or array-like\n\n Returns\n -------\n tuple of numpy.ndarray or ExtensionArray and optional BlockValuesRefs\n \"\"\"\n self._ensure_valid_index(value)\n\n # Using a DataFrame would mean coercing values to one dtype\n assert not isinstance(value, DataFrame)\n if is_dict_like(value):\n if not isinstance(value, Series):\n value = Series(value)\n return _reindex_for_setitem(value, self.index)\n\n if is_list_like(value):\n com.require_length_match(value, self.index)\n return sanitize_array(value, self.index, copy=True, allow_2d=True), None\n\n @property\n def _series(self):\n return {item: self._ixs(idx, axis=1) for idx, item in enumerate(self.columns)}\n\n # ----------------------------------------------------------------------\n # Reindexing and alignment\n\n def _reindex_multi(self, axes: dict[str, Index], fill_value) -> DataFrame:\n \"\"\"\n We are guaranteed non-Nones in the axes.\n \"\"\"\n\n new_index, row_indexer = self.index.reindex(axes[\"index\"])\n new_columns, col_indexer = self.columns.reindex(axes[\"columns\"])\n\n if row_indexer is not None and col_indexer is not None:\n # Fastpath. By doing two 'take's at once we avoid making an\n # unnecessary copy.\n # We only get here with `self._can_fast_transpose`, which (almost)\n # ensures that self.values is cheap. It may be worth making this\n # condition more specific.\n indexer = row_indexer, col_indexer\n new_values = take_2d_multi(self.values, indexer, fill_value=fill_value)\n return self._constructor(\n new_values, index=new_index, columns=new_columns, copy=False\n )\n else:\n return self._reindex_with_indexers(\n {0: [new_index, row_indexer], 1: [new_columns, col_indexer]},\n fill_value=fill_value,\n )\n\n def set_axis(\n self,\n labels,\n *,\n axis: Axis = 0,\n copy: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame:\n \"\"\"\n Assign desired index to given axis.\n\n Indexes for column or row labels can be changed by assigning\n a list-like or Index.\n\n Parameters\n ----------\n labels : list-like, Index\n The values for the new index.\n\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to update. The value 0 identifies the rows. For `Series`\n this parameter is unused and defaults to 0.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n Returns\n -------\n DataFrame\n An object of type DataFrame.\n\n See Also\n --------\n DataFrame.rename_axis : Alter the name of the index or columns.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n\n Change the row labels.\n\n >>> df.set_axis([\"a\", \"b\", \"c\"], axis=\"index\")\n A B\n a 1 4\n b 2 5\n c 3 6\n\n Change the column labels.\n\n >>> df.set_axis([\"I\", \"II\"], axis=\"columns\")\n I II\n 0 1 4\n 1 2 5\n 2 3 6\n \"\"\"\n return super().set_axis(labels, axis=axis, copy=copy)\n\n def reindex(\n self,\n labels=None,\n *,\n index=None,\n columns=None,\n axis: Axis | None = None,\n method: ReindexMethod | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n fill_value: Scalar | None = np.nan,\n limit: int | None = None,\n tolerance=None,\n ) -> DataFrame:\n \"\"\"\n Conform DataFrame to new index with optional filling logic.\n\n Places NA/NaN in locations having no value in the previous index. A new object\n is produced unless the new index is equivalent to the current one and\n ``copy=False``.\n\n Parameters\n ----------\n\n labels : array-like, optional\n New labels / index to conform the axis specified by 'axis' to.\n index : array-like, optional\n New labels for the index. Preferably an Index object to avoid\n duplicating data.\n columns : array-like, optional\n New labels for the columns. Preferably an Index object to avoid\n duplicating data.\n axis : int or str, optional\n Axis to target. Can be either the axis name ('index', 'columns')\n or number (0, 1).\n method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}\n Method to use for filling holes in reindexed DataFrame.\n Please note: this is only applicable to DataFrames/Series with a\n monotonically increasing/decreasing index.\n\n * None (default): don't fill gaps\n * pad / ffill: Propagate last valid observation forward to next\n valid.\n * backfill / bfill: Use next valid observation to fill gap.\n * nearest: Use nearest valid observations to fill gap.\n\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n level : int or name\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : scalar, default np.nan\n Value to use for missing values. Defaults to NaN, but can be any\n \"compatible\" value.\n limit : int, default None\n Maximum number of consecutive elements to forward or backward fill.\n tolerance : optional\n Maximum distance between original and new labels for inexact\n matches. The values of the index at the matching locations most\n satisfy the equation ``abs(index[indexer] - target) <= tolerance``.\n\n Tolerance may be a scalar value, which applies the same tolerance\n to all values, or list-like, which applies variable tolerance per\n element. List-like includes list, tuple, array, Series, and must be\n the same size as the index and its dtype must exactly match the\n index's type.\n\n Returns\n -------\n DataFrame\n DataFrame with changed index.\n\n See Also\n --------\n DataFrame.set_index : Set row labels.\n DataFrame.reset_index : Remove row labels or move them to new columns.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n ``DataFrame.reindex`` supports two calling conventions\n\n * ``(index=index_labels, columns=column_labels, ...)``\n * ``(labels, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Create a DataFrame with some fictional data.\n\n >>> index = [\"Firefox\", \"Chrome\", \"Safari\", \"IE10\", \"Konqueror\"]\n >>> columns = [\"http_status\", \"response_time\"]\n >>> df = pd.DataFrame(\n ... [[200, 0.04], [200, 0.02], [404, 0.07], [404, 0.08], [301, 1.0]],\n ... columns=columns,\n ... index=index,\n ... )\n >>> df\n http_status response_time\n Firefox 200 0.04\n Chrome 200 0.02\n Safari 404 0.07\n IE10 404 0.08\n Konqueror 301 1.00\n\n Create a new index and reindex the DataFrame. By default\n values in the new index that do not have corresponding\n records in the DataFrame are assigned ``NaN``.\n\n >>> new_index = [\"Safari\", \"Iceweasel\", \"Comodo Dragon\", \"IE10\", \"Chrome\"]\n >>> df.reindex(new_index)\n http_status response_time\n Safari 404.0 0.07\n Iceweasel NaN NaN\n Comodo Dragon NaN NaN\n IE10 404.0 0.08\n Chrome 200.0 0.02\n\n We can fill in the missing values by passing a value to\n the keyword ``fill_value``. Because the index is not monotonically\n increasing or decreasing, we cannot use arguments to the keyword\n ``method`` to fill the ``NaN`` values.\n\n >>> df.reindex(new_index, fill_value=0)\n http_status response_time\n Safari 404 0.07\n Iceweasel 0 0.00\n Comodo Dragon 0 0.00\n IE10 404 0.08\n Chrome 200 0.02\n\n We can also reindex the columns.\n\n >>> df.reindex(columns=[\"http_status\", \"user_agent\"])\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n Or we can use \"axis-style\" keyword arguments\n\n >>> df.reindex([\"http_status\", \"user_agent\"], axis=\"columns\")\n http_status user_agent\n Firefox 200 NaN\n Chrome 200 NaN\n Safari 404 NaN\n IE10 404 NaN\n Konqueror 301 NaN\n\n To further illustrate the filling functionality in\n ``reindex``, we will create a DataFrame with a\n monotonically increasing index (for example, a sequence\n of dates).\n\n >>> date_index = pd.date_range(\"1/1/2010\", periods=6, freq=\"D\")\n >>> df2 = pd.DataFrame(\n ... {\"prices\": [100, 101, np.nan, 100, 89, 88]}, index=date_index\n ... )\n >>> df2\n prices\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n\n Suppose we decide to expand the DataFrame to cover a wider\n date range.\n\n >>> date_index2 = pd.date_range(\"12/29/2009\", periods=10, freq=\"D\")\n >>> df2.reindex(date_index2)\n prices\n 2009-12-29 NaN\n 2009-12-30 NaN\n 2009-12-31 NaN\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n The index entries that did not have a value in the original data frame\n (for example, '2009-12-29') are by default filled with ``NaN``.\n If desired, we can fill in the missing values using one of several\n options.\n\n For example, to back-propagate the last valid value to fill the ``NaN``\n values, pass ``bfill`` as an argument to the ``method`` keyword.\n\n >>> df2.reindex(date_index2, method=\"bfill\")\n prices\n 2009-12-29 100.0\n 2009-12-30 100.0\n 2009-12-31 100.0\n 2010-01-01 100.0\n 2010-01-02 101.0\n 2010-01-03 NaN\n 2010-01-04 100.0\n 2010-01-05 89.0\n 2010-01-06 88.0\n 2010-01-07 NaN\n\n Please note that the ``NaN`` value present in the original DataFrame\n (at index value 2010-01-03) will not be filled by any of the\n value propagation schemes. This is because filling while reindexing\n does not look at DataFrame values, but only compares the original and\n desired indexes. If you do want to fill in the ``NaN`` values present\n in the original DataFrame, use the ``fillna()`` method.\n\n See the :ref:`user guide ` for more.\n \"\"\"\n return super().reindex(\n labels=labels,\n index=index,\n columns=columns,\n axis=axis,\n method=method,\n level=level,\n fill_value=fill_value,\n limit=limit,\n tolerance=tolerance,\n copy=copy,\n )\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[True],\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: Literal[False] = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop(\n self,\n labels: IndexLabel | ListLike = ...,\n *,\n axis: Axis = ...,\n index: IndexLabel | ListLike = ...,\n columns: IndexLabel | ListLike = ...,\n level: Level = ...,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def drop(\n self,\n labels: IndexLabel | ListLike = None,\n *,\n axis: Axis = 0,\n index: IndexLabel | ListLike = None,\n columns: IndexLabel | ListLike = None,\n level: Level | None = None,\n inplace: bool | lib.NoDefault = lib.no_default,\n errors: IgnoreRaise = \"raise\",\n ) -> DataFrame | None:\n \"\"\"\n Drop specified labels from rows or columns.\n\n Remove rows or columns by specifying label names and corresponding\n axis, or by directly specifying index or column names. When using a\n multi-index, labels on different levels can be removed by specifying\n the level. See the :ref:`user guide `\n for more information about the now unused levels.\n\n Parameters\n ----------\n labels : single label or iterable of labels\n Index or column labels to drop. A tuple will be used as a single\n label and not treated as an iterable.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Whether to drop labels from the index (0 or 'index') or\n columns (1 or 'columns').\n index : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=0``\n is equivalent to ``index=labels``).\n columns : single label or iterable of labels\n Alternative to specifying axis (``labels, axis=1``\n is equivalent to ``columns=labels``).\n level : int or level name, optional\n For MultiIndex, level from which the labels will be removed.\n inplace : bool, default False\n If False, return a copy. Otherwise, do operation\n in place and return None.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n errors : {'ignore', 'raise'}, default 'raise'\n If 'ignore', suppress error and only existing labels are\n dropped.\n\n Returns\n -------\n DataFrame or None\n Returns DataFrame or None DataFrame with the specified\n index or column labels removed or None if inplace=True.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis.\n\n See Also\n --------\n DataFrame.loc : Label-location based indexer for selection by label.\n DataFrame.dropna : Return DataFrame with labels on given axis omitted\n where (all or any) data are missing.\n DataFrame.drop_duplicates : Return DataFrame with duplicate rows\n removed, optionally only considering certain columns.\n Series.drop : Return Series with specified index labels removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=[\"A\", \"B\", \"C\", \"D\"])\n >>> df\n A B C D\n 0 0 1 2 3\n 1 4 5 6 7\n 2 8 9 10 11\n\n Drop columns\n\n >>> df.drop([\"B\", \"C\"], axis=1)\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n >>> df.drop(columns=[\"B\", \"C\"])\n A D\n 0 0 3\n 1 4 7\n 2 8 11\n\n Drop a row by index\n\n >>> df.drop([0, 1])\n A B C D\n 2 8 9 10 11\n\n Drop columns and/or rows of MultiIndex DataFrame\n\n >>> midx = pd.MultiIndex(\n ... levels=[[\"llama\", \"cow\", \"falcon\"], [\"speed\", \"weight\", \"length\"]],\n ... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],\n ... )\n >>> df = pd.DataFrame(\n ... index=midx,\n ... columns=[\"big\", \"small\"],\n ... data=[\n ... [45, 30],\n ... [200, 100],\n ... [1.5, 1],\n ... [30, 20],\n ... [250, 150],\n ... [1.5, 0.8],\n ... [320, 250],\n ... [1, 0.8],\n ... [0.3, 0.2],\n ... ],\n ... )\n >>> df\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n weight 1.0 0.8\n length 0.3 0.2\n\n Drop a specific index combination from the MultiIndex\n DataFrame, i.e., drop the combination ``'falcon'`` and\n ``'weight'``, which deletes only the corresponding row\n\n >>> df.drop(index=(\"falcon\", \"weight\"))\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n length 1.5 1.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n length 1.5 0.8\n falcon speed 320.0 250.0\n length 0.3 0.2\n\n >>> df.drop(index=\"cow\", columns=\"small\")\n big\n llama speed 45.0\n weight 200.0\n length 1.5\n falcon speed 320.0\n weight 1.0\n length 0.3\n\n >>> df.drop(index=\"length\", level=1)\n big small\n llama speed 45.0 30.0\n weight 200.0 100.0\n cow speed 30.0 20.0\n weight 250.0 150.0\n falcon speed 320.0 250.0\n weight 1.0 0.8\n \"\"\"\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.drop is deprecated \"\n \"and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n return super().drop(\n labels=labels,\n axis=axis,\n index=index,\n columns=columns,\n level=level,\n inplace=inplace,\n errors=errors,\n )\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[True],\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> None: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: Literal[False] = ...,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame: ...\n\n @overload\n def rename(\n self,\n mapper: Renamer | None = ...,\n *,\n index: Renamer | None = ...,\n columns: Renamer | None = ...,\n axis: Axis | None = ...,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level = ...,\n errors: IgnoreRaise = ...,\n ) -> DataFrame | None: ...\n\n def rename(\n self,\n mapper: Renamer | None = None,\n *,\n index: Renamer | None = None,\n columns: Renamer | None = None,\n axis: Axis | None = None,\n copy: bool | lib.NoDefault = lib.no_default,\n inplace: bool | lib.NoDefault = lib.no_default,\n level: Level | None = None,\n errors: IgnoreRaise = \"ignore\",\n ) -> DataFrame | None:\n \"\"\"\n Rename columns or index labels.\n\n Function / dict values must be unique (1-to-1). Labels not contained in\n a dict / Series will be left as-is. Extra labels listed don't throw an\n error.\n\n See the :ref:`user guide ` for more.\n\n Parameters\n ----------\n mapper : dict-like or function\n Dict-like or function transformations to apply to\n that axis' values. Use either ``mapper`` and ``axis`` to\n specify the axis to target with ``mapper``, or ``index`` and\n ``columns``.\n index : dict-like or function\n Alternative to specifying axis (``mapper, axis=0``\n is equivalent to ``index=mapper``).\n columns : dict-like or function\n Alternative to specifying axis (``mapper, axis=1``\n is equivalent to ``columns=mapper``).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Axis to target with ``mapper``. Can be either the axis name\n ('index', 'columns') or number (0, 1). The default is 'index'.\n copy : bool, default False\n This keyword is now ignored; changing its value will have no\n impact on the method.\n\n .. deprecated:: 3.0.0\n\n This keyword is ignored and will be removed in pandas 4.0. Since\n pandas 3.0, this method always returns a new object using a lazy\n copy mechanism that defers copies until necessary\n (Copy-on-Write). See the `user guide on Copy-on-Write\n `__\n for more details.\n\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n If True then value of copy is ignored.\n\n .. deprecated:: 3.1.0\n\n This keyword is deprecated and will be removed in pandas 4.0.\n See `PDEP-8 In-place methods in pandas\n `__\n for more details.\n\n level : int or level name, default None\n In case of a MultiIndex, only rename labels in the specified\n level.\n errors : {'ignore', 'raise'}, default 'ignore'\n If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,\n or `columns` contains labels that are not present in the Index\n being transformed.\n If 'ignore', existing keys will be renamed and extra keys will be\n ignored.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the renamed axis labels or None if ``inplace=True``.\n\n Raises\n ------\n KeyError\n If any of the labels is not found in the selected axis and\n \"errors='raise'\".\n\n See Also\n --------\n DataFrame.rename_axis : Set the name of the axis.\n\n Examples\n --------\n ``DataFrame.rename`` supports two calling conventions\n\n * ``(index=index_mapper, columns=columns_mapper, ...)``\n * ``(mapper, axis={'index', 'columns'}, ...)``\n\n We *highly* recommend using keyword arguments to clarify your\n intent.\n\n Rename columns using a mapping:\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]})\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"c\"})\n a c\n 0 1 4\n 1 2 5\n 2 3 6\n\n Rename index using a mapping:\n\n >>> df.rename(index={0: \"x\", 1: \"y\", 2: \"z\"})\n A B\n x 1 4\n y 2 5\n z 3 6\n\n Cast index labels to a different type:\n\n >>> df.index\n RangeIndex(start=0, stop=3, step=1)\n >>> df.rename(index=str).index\n Index(['0', '1', '2'], dtype='str')\n\n >>> df.rename(columns={\"A\": \"a\", \"B\": \"b\", \"C\": \"c\"}, errors=\"raise\")\n Traceback (most recent call last):\n KeyError: ['C'] not found in axis\n\n Using axis-style parameters:\n\n >>> df.rename(str.lower, axis=\"columns\")\n a b\n 0 1 4\n 1 2 5\n 2 3 6\n\n >>> df.rename({1: 2, 2: 4}, axis=\"index\")\n A B\n 0 1 4\n 2 2 5\n 4 3 6\n \"\"\"\n\n if inplace is not lib.no_default:\n # GH#63207\n warnings.warn(\n \"The inplace keyword in DataFrame.rename is \"\n \"deprecated and will be removed in a future version. \"\n \"See PDEP-8 for more details:\"\n \"https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n inplace = False\n\n self._check_copy_deprecation(copy)\n\n return super()._rename(\n mapper=mapper,\n index=index,\n columns=columns,\n axis=axis,\n inplace=inplace,\n level=level,\n errors=errors,\n )\n\n def pop(self, item: Hashable) -> Series:\n \"\"\"\n Return item and drop it from DataFrame. Raise KeyError if not found.\n\n The column is removed from the DataFrame and returned as a Series;\n the original DataFrame is modified in place unless it was a view.\n\n Parameters\n ----------\n item : label\n Label of column to be popped.\n\n Returns\n -------\n Series\n Series representing the item that is dropped.\n\n See Also\n --------\n DataFrame.drop: Drop specified labels from rows or columns.\n DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [\n ... (\"falcon\", \"bird\", 389.0),\n ... (\"parrot\", \"bird\", 24.0),\n ... (\"lion\", \"mammal\", 80.5),\n ... (\"monkey\", \"mammal\", np.nan),\n ... ],\n ... columns=(\"name\", \"class\", \"max_speed\"),\n ... )\n >>> df\n name class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n >>> df.pop(\"class\")\n 0 bird\n 1 bird\n 2 mammal\n 3 mammal\n Name: class, dtype: str\n\n >>> df\n name max_speed\n 0 falcon 389.0\n 1 parrot 24.0\n 2 lion 80.5\n 3 monkey NaN\n \"\"\"\n return super().pop(item=item)\n\n def _replace_columnwise(\n self, mapping: dict[Hashable, tuple[Any, Any]], inplace: bool, regex\n ) -> Self:\n \"\"\"\n Dispatch to Series.replace column-wise.\n\n Parameters\n ----------\n mapping : dict\n of the form {col: (target, value)}\n inplace : bool\n regex : bool or same types as `to_replace` in DataFrame.replace\n\n Returns\n -------\n DataFrame\n \"\"\"\n # Operate column-wise\n res = self if inplace else self.copy(deep=False)\n ax = self.columns\n\n for i, ax_value in enumerate(ax):\n if ax_value in mapping:\n ser = self.iloc[:, i]\n\n target, value = mapping[ax_value]\n newobj = ser.replace(target, value, regex=regex)\n\n res._iset_item(i, newobj, inplace=inplace)\n\n return res if inplace else res.__finalize__(self)\n\n def shift(\n self,\n periods: int | Sequence[int] = 1,\n freq: Frequency | None = None,\n axis: Axis = 0,\n fill_value: Hashable = lib.no_default,\n suffix: str | None = None,\n ) -> DataFrame:\n \"\"\"\n Shift index by desired number of periods with an optional time `freq`.\n\n When `freq` is not passed, shift the index without realigning the data.\n If `freq` is passed (in this case, the index must be date or datetime,\n or it will raise a `NotImplementedError`), the index will be\n increased using the periods and the `freq`. `freq` can be inferred\n when specified as \"infer\" as long as either freq or inferred_freq\n attribute is set in the index.\n\n Parameters\n ----------\n periods : int or Sequence\n Number of periods to shift. Can be positive or negative.\n If an iterable of ints, the data will be shifted once by each int.\n This is equivalent to shifting by one value at a time and\n concatenating all resulting frames. The resulting columns will have\n the shift suffixed to their column names. For multiple periods,\n axis must not be 1.\n freq : DateOffset, tseries.offsets, timedelta, or str, optional\n Offset to use from the tseries module or time rule (e.g. 'EOM').\n If `freq` is specified then the index values are shifted but the\n data is not realigned. That is, use `freq` if you would like to\n extend the index when shifting and preserve the original data.\n If `freq` is specified as \"infer\" then it will be inferred from\n the freq or inferred_freq attributes of the index. If neither of\n those attributes exist, a ValueError is thrown.\n axis : {0 or 'index', 1 or 'columns', None}, default None\n Shift direction. For `Series` this parameter is unused and defaults to 0.\n fill_value : object, optional\n The scalar value to use for newly introduced missing values.\n the default depends on the dtype of `self`.\n For Boolean and numeric NumPy data types, ``np.nan`` is used.\n For datetime, timedelta, or period data, etc. :attr:`NaT` is used.\n For extension dtypes, ``self.dtype.na_value`` is used.\n suffix : str, optional\n If str and periods is an iterable, this is added after the column\n name and before the shift value for each shifted column name.\n For `Series` this parameter is unused and defaults to `None`.\n\n Returns\n -------\n DataFrame\n Copy of input object, shifted.\n\n See Also\n --------\n Index.shift : Shift values of Index.\n DatetimeIndex.shift : Shift values of DatetimeIndex.\n PeriodIndex.shift : Shift values of PeriodIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [[10, 13, 17], [20, 23, 27], [15, 18, 22], [30, 33, 37], [45, 48, 52]],\n ... columns=[\"Col1\", \"Col2\", \"Col3\"],\n ... index=pd.date_range(\"2020-01-01\", \"2020-01-05\"),\n ... )\n >>> df\n Col1 Col2 Col3\n 2020-01-01 10 13 17\n 2020-01-02 20 23 27\n 2020-01-03 15 18 22\n 2020-01-04 30 33 37\n 2020-01-05 45 48 52\n\n >>> df.shift(periods=3)\n Col1 Col2 Col3\n 2020-01-01 NaN NaN NaN\n 2020-01-02 NaN NaN NaN\n 2020-01-03 NaN NaN NaN\n 2020-01-04 10.0 13.0 17.0\n 2020-01-05 20.0 23.0 27.0\n\n >>> df.shift(periods=1, axis=\"columns\")\n Col1 Col2 Col3\n 2020-01-01 NaN 10 13\n 2020-01-02 NaN 20 23\n 2020-01-03 NaN 15 18\n 2020-01-04 NaN 30 33\n 2020-01-05 NaN 45 48\n\n >>> df.shift(periods=3, fill_value=0)\n Col1 Col2 Col3\n 2020-01-01 0 0 0\n 2020-01-02 0 0 0\n 2020-01-03 0 0 0\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n\n >>> df.shift(periods=3, freq=\"D\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df.shift(periods=3, freq=\"infer\")\n Col1 Col2 Col3\n 2020-01-04 10 13 17\n 2020-01-05 20 23 27\n 2020-01-06 15 18 22\n 2020-01-07 30 33 37\n 2020-01-08 45 48 52\n\n >>> df[\"Col1\"].shift(periods=[0, 1, 2])\n Col1_0 Col1_1 Col1_2\n 2020-01-01 10 NaN NaN\n 2020-01-02 20 10.0 NaN\n 2020-01-03 15 20.0 10.0\n 2020-01-04 30 15.0 20.0\n 2020-01-05 45 30.0 15.0\n \"\"\"\n if freq is not None and fill_value is not lib.no_default:\n # GH#53832\n raise ValueError(\n \"Passing a 'freq' together with a 'fill_value' is not allowed.\"\n )\n\n if self.empty and freq is None:\n return self.copy()\n\n axis = self._get_axis_number(axis)\n\n if is_list_like(periods):\n periods = cast(\"Sequence\", periods)\n if axis == 1:\n raise ValueError(\n \"If `periods` contains multiple shifts, `axis` cannot be 1.\"\n )\n if len(periods) == 0:\n raise ValueError(\"If `periods` is an iterable, it cannot be empty.\")\n from pandas.core.reshape.concat import concat\n\n shifted_dataframes = []\n for period in periods:\n if not is_integer(period):\n raise TypeError(\n f\"Periods must be integer, but {period} is {type(period)}.\"\n )\n shifted_dataframes.append(\n super()\n .shift(periods=period, freq=freq, axis=axis, fill_value=fill_value)\n .add_suffix(f\"{suffix}_{period}\" if suffix else f\"_{period}\")\n )\n return concat(shifted_dataframes, axis=1, sort=False)\n elif suffix:\n raise ValueError(\"Cannot specify `suffix` if `periods` is an int.\")\n periods = cast(\"int\", periods)\n\n ncols = len(self.columns)\n if axis == 1 and periods != 0 and ncols > 0 and freq is None:\n if fill_value is lib.no_default:\n # We will infer fill_value to match the closest column\n\n # Use a column that we know is valid for our column's dtype GH#38434\n label = self.columns[0]\n\n if periods > 0:\n result = self.iloc[:, :-periods]\n for col in range(min(ncols, abs(periods))):\n # TODO(EA2D): doing this in a loop unnecessary with 2D EAs\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, 0].shift(len(self))\n result.insert(0, label, filler, allow_duplicates=True)\n else:\n result = self.iloc[:, -periods:]\n for col in range(min(ncols, abs(periods))):\n # Define filler inside loop so we get a copy\n filler = self.iloc[:, -1].shift(len(self))\n result.insert(\n len(result.columns), label, filler, allow_duplicates=True\n )\n\n result.columns = self.columns.copy()\n return result\n elif len(self._mgr.blocks) > 1 or (\n # If we only have one block and we know that we can't\n # keep the same dtype (i.e. the _can_hold_element check)\n # then we can go through the reindex_indexer path\n # (and avoid casting logic in the Block method).\n not can_hold_element(self._mgr.blocks[0].values, fill_value)\n ):\n # GH#35488 we need to watch out for multi-block cases\n # We only get here with fill_value not-lib.no_default\n nper = abs(periods)\n nper = min(nper, ncols)\n if periods > 0:\n indexer = np.array(\n [-1] * nper + list(range(ncols - periods)), dtype=np.intp\n )\n else:\n indexer = np.array(\n list(range(nper, ncols)) + [-1] * nper, dtype=np.intp\n )\n mgr = self._mgr.reindex_indexer(\n self.columns,\n indexer,\n axis=0,\n fill_value=fill_value,\n allow_dups=True,\n )\n res_df = self._constructor_from_mgr(mgr, axes=mgr.axes)\n return res_df.__finalize__(self, method=\"shift\")\n else:\n return self.T.shift(periods=periods, fill_value=fill_value).T\n\n return super().shift(\n periods=periods, freq=freq, axis=axis, fill_value=fill_value\n )\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[False] = ...,\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> DataFrame: ...\n\n @overload\n def set_index(\n self,\n keys,\n *,\n drop: bool = ...,\n append: bool = ...,\n inplace: Literal[True],\n verify_integrity: bool | lib.NoDefault = ...,\n ) -> None: ...\n\n def set_index(\n self,\n keys,\n *,\n drop: bool = True,\n append: bool = False,\n inplace: bool = False,\n verify_integrity: bool | lib.NoDefault = lib.no_default,\n ) -> DataFrame | None:\n \"\"\"\n Set the DataFrame index using existing columns.\n\n Set the DataFrame index (row labels) using one or more existing\n columns or arrays (of the correct length). The index can replace the\n existing index or expand on it.\n\n Parameters\n ----------\n keys : label or array-like or list of labels/arrays\n This parameter can be either a single column key, a single array of\n the same length as the calling DataFrame, or a list containing an\n arbitrary combination of column keys and arrays. Here, \"array\"\n encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and\n instances of :class:`~collections.abc.Iterator`.\n drop : bool, default True\n Delete columns to be used as the new index.\n append : bool, default False\n Whether to append columns to existing index.\n Setting to True will add the new columns to existing index.\n When set to False, the current index will be dropped from the DataFrame.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n verify_integrity : bool, default False\n Check the new index for duplicates. Otherwise defer the check until\n necessary. Setting to False will improve the performance of this\n method.\n\n .. deprecated:: 3.0.0\n\n The ``verify_integrity`` keyword is deprecated and will be\n removed in a future version. Check ``result.index.is_unique``\n directly instead.\n\n Returns\n -------\n DataFrame or None\n Changed row labels or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.reset_index : Opposite of set_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"month\": [1, 4, 7, 10],\n ... \"year\": [2012, 2014, 2013, 2014],\n ... \"sale\": [55, 40, 84, 31],\n ... }\n ... )\n >>> df\n month year sale\n 0 1 2012 55\n 1 4 2014 40\n 2 7 2013 84\n 3 10 2014 31\n\n Set the index to become the 'month' column:\n\n >>> df.set_index(\"month\")\n year sale\n month\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n Create a MultiIndex using columns 'year' and 'month':\n\n >>> df.set_index([\"year\", \"month\"])\n sale\n year month\n 2012 1 55\n 2014 4 40\n 2013 7 84\n 2014 10 31\n\n Create a MultiIndex using an Index and a column:\n\n >>> df.set_index([pd.Index([1, 2, 3, 4]), \"year\"])\n month sale\n year\n 1 2012 1 55\n 2 2014 4 40\n 3 2013 7 84\n 4 2014 10 31\n\n Create a MultiIndex using two Series:\n\n >>> s = pd.Series([1, 2, 3, 4])\n >>> df.set_index([s, s**2])\n month year sale\n 1 1 1 2012 55\n 2 4 4 2014 40\n 3 9 7 2013 84\n 4 16 10 2014 31\n\n Append a column to the existing index:\n\n >>> df = df.set_index(\"month\")\n >>> df.set_index(\"year\", append=True)\n sale\n month year\n 1 2012 55\n 4 2014 40\n 7 2013 84\n 10 2014 31\n\n >>> df.set_index(\"year\", append=False)\n sale\n year\n 2012 55\n 2014 40\n 2013 84\n 2014 31\n \"\"\"\n if verify_integrity is not lib.no_default:\n # GH#62919\n warnings.warn(\n \"The 'verify_integrity' keyword in DataFrame.set_index is \"\n \"deprecated and will be removed in a future version. \"\n \"Directly check the result.index.is_unique instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n else:\n verify_integrity = False\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if not isinstance(keys, list):\n keys = [keys]\n\n err_msg = (\n 'The parameter \"keys\" may be a column key, one-dimensional '\n \"array, or a list containing only valid column keys and \"\n \"one-dimensional arrays.\"\n )\n\n missing: list[Hashable] = []\n for col in keys:\n if isinstance(col, (Index, Series, np.ndarray, list, abc.Iterator)):\n # arrays are fine as long as they are one-dimensional\n # iterators get converted to list below\n if getattr(col, \"ndim\", 1) != 1:\n raise ValueError(err_msg)\n else:\n # everything else gets tried as a key; see GH 24969\n try:\n found = col in self.columns\n except TypeError as err:\n raise TypeError(\n f\"{err_msg}. Received column of type {type(col)}\"\n ) from err\n else:\n if not found:\n missing.append(col)\n\n if missing:\n raise KeyError(f\"None of {missing} are in the columns\")\n\n if inplace:\n frame = self\n else:\n frame = self.copy(deep=False)\n\n arrays: list[Index] = []\n names: list[Hashable] = []\n if append:\n names = list(self.index.names)\n if isinstance(self.index, MultiIndex):\n arrays.extend(\n self.index._get_level_values(i) for i in range(self.index.nlevels)\n )\n else:\n arrays.append(self.index)\n\n to_remove: set[Hashable] = set()\n for col in keys:\n if isinstance(col, MultiIndex):\n arrays.extend(col._get_level_values(n) for n in range(col.nlevels))\n names.extend(col.names)\n elif isinstance(col, (Index, Series)):\n # if Index then not MultiIndex (treated above)\n\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[Index, Series]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(col.name)\n elif isinstance(col, (list, np.ndarray)):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"Union[List[Any], ndarray]\"; expected \"Index\"\n arrays.append(col) # type: ignore[arg-type]\n names.append(None)\n elif isinstance(col, abc.Iterator):\n # error: Argument 1 to \"append\" of \"list\" has incompatible type\n # \"List[Any]\"; expected \"Index\"\n arrays.append(list(col)) # type: ignore[arg-type]\n names.append(None)\n # from here, col can only be a column label\n else:\n arrays.append(frame[col])\n names.append(col)\n if drop:\n to_remove.add(col)\n\n if len(arrays[-1]) != len(self):\n # check newest element against length of calling frame, since\n # ensure_index_from_sequences would not raise for append=False.\n raise ValueError(\n f\"Length mismatch: Expected {len(self)} rows, \"\n f\"received array of length {len(arrays[-1])}\"\n )\n\n index = ensure_index_from_sequences(arrays, names)\n\n if verify_integrity and not index.is_unique:\n duplicates = index[index.duplicated()].unique()\n raise ValueError(f\"Index has duplicate keys: {duplicates}\")\n\n # use set to handle duplicate column names gracefully in case of drop\n for c in to_remove:\n del frame[c]\n\n # clear up memory usage\n index._cleanup()\n\n frame.index = index\n\n if not inplace:\n return frame\n return None\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[False] = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: Literal[True],\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> None: ...\n\n @overload\n def reset_index(\n self,\n level: IndexLabel = ...,\n *,\n drop: bool = ...,\n inplace: bool = ...,\n col_level: Hashable = ...,\n col_fill: Hashable = ...,\n allow_duplicates: bool = ...,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None: ...\n\n def reset_index(\n self,\n level: IndexLabel | None = None,\n *,\n drop: bool = False,\n inplace: bool = False,\n col_level: Hashable = 0,\n col_fill: Hashable = \"\",\n allow_duplicates: bool = False,\n names: Hashable | Sequence[Hashable] | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Reset the index, or a level of it.\n\n Reset the index of the DataFrame, and use the default one instead.\n If the DataFrame has a MultiIndex, this method can remove one or more\n levels.\n\n Parameters\n ----------\n level : int, str, tuple, or list, default None\n Only remove the given levels from the index. Removes all levels by\n default.\n drop : bool, default False\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n col_level : int or str, default 0\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the first\n level.\n col_fill : object, default ''\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.\n allow_duplicates : bool, default False\n Allow duplicate column labels to be created.\n names : int, str or 1-dimensional list, default None\n Using the given string, rename the DataFrame column which contains the\n index data. If the DataFrame has a MultiIndex, this has to be a list\n with length equal to the number of levels.\n\n Returns\n -------\n DataFrame or None\n DataFrame with the new index or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.set_index : Opposite of reset_index.\n DataFrame.reindex : Change to new indices or expand indices.\n DataFrame.reindex_like : Change to same indices as other DataFrame.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [(\"bird\", 389.0), (\"bird\", 24.0), (\"mammal\", 80.5), (\"mammal\", np.nan)],\n ... index=[\"falcon\", \"parrot\", \"lion\", \"monkey\"],\n ... columns=(\"class\", \"max_speed\"),\n ... )\n >>> df\n class max_speed\n falcon bird 389.0\n parrot bird 24.0\n lion mammal 80.5\n monkey mammal NaN\n\n When we reset the index, the old index is added as a column, and a\n new sequential index is used:\n\n >>> df.reset_index()\n index class max_speed\n 0 falcon bird 389.0\n 1 parrot bird 24.0\n 2 lion mammal 80.5\n 3 monkey mammal NaN\n\n We can use the `drop` parameter to avoid the old index being added as\n a column:\n\n >>> df.reset_index(drop=True)\n class max_speed\n 0 bird 389.0\n 1 bird 24.0\n 2 mammal 80.5\n 3 mammal NaN\n\n You can also use `reset_index` with `MultiIndex`.\n\n >>> index = pd.MultiIndex.from_tuples(\n ... [\n ... (\"bird\", \"falcon\"),\n ... (\"bird\", \"parrot\"),\n ... (\"mammal\", \"lion\"),\n ... (\"mammal\", \"monkey\"),\n ... ],\n ... names=[\"class\", \"name\"],\n ... )\n >>> columns = pd.MultiIndex.from_tuples([(\"speed\", \"max\"), (\"species\", \"type\")])\n >>> df = pd.DataFrame(\n ... [(389.0, \"fly\"), (24.0, \"fly\"), (80.5, \"run\"), (np.nan, \"jump\")],\n ... index=index,\n ... columns=columns,\n ... )\n >>> df\n speed species\n max type\n class name\n bird falcon 389.0 fly\n parrot 24.0 fly\n mammal lion 80.5 run\n monkey NaN jump\n\n Using the `names` parameter, choose a name for the index column:\n\n >>> df.reset_index(names=[\"classes\", \"names\"])\n classes names speed species\n max type\n 0 bird falcon 389.0 fly\n 1 bird parrot 24.0 fly\n 2 mammal lion 80.5 run\n 3 mammal monkey NaN jump\n\n If the index has multiple levels, we can reset a subset of them:\n\n >>> df.reset_index(level=\"class\")\n class speed species\n max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we are not dropping the index, by default, it is placed in the top\n level. We can place it in another level:\n\n >>> df.reset_index(level=\"class\", col_level=1)\n speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n When the index is inserted under another level, we can specify under\n which one with the parameter `col_fill`:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"species\")\n species speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n\n If we specify a nonexistent level for `col_fill`, it is created:\n\n >>> df.reset_index(level=\"class\", col_level=1, col_fill=\"genus\")\n genus speed species\n class max type\n name\n falcon bird 389.0 fly\n parrot bird 24.0 fly\n lion mammal 80.5 run\n monkey mammal NaN jump\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n self._check_inplace_and_allows_duplicate_labels(inplace)\n if inplace:\n new_obj = self\n else:\n new_obj = self.copy(deep=False)\n allow_duplicates = validate_bool_kwarg(allow_duplicates, \"allow_duplicates\")\n\n new_index = default_index(len(new_obj))\n if level is not None:\n if not isinstance(level, (tuple, list)):\n level = [level]\n level = [self.index._get_level_number(lev) for lev in level]\n if len(level) < self.index.nlevels:\n new_index = self.index.droplevel(level) # type: ignore[assignment]\n\n if not drop:\n to_insert: Iterable[tuple[Any, Any | None]]\n\n default = \"index\" if \"index\" not in self else \"level_0\"\n names = self.index._get_default_index_names(names, default) # type: ignore[arg-type]\n\n if isinstance(self.index, MultiIndex):\n to_insert = zip(\n reversed(self.index.levels),\n reversed(self.index.codes),\n strict=True,\n )\n else:\n to_insert = ((self.index, None),)\n\n multi_col = isinstance(self.columns, MultiIndex)\n for j, (lev, lab) in enumerate(to_insert, start=1):\n i = self.index.nlevels - j\n if level is not None and i not in level:\n continue\n name = names[i]\n if multi_col:\n col_name = list(name) if isinstance(name, tuple) else [name]\n if col_fill is None:\n if len(col_name) not in (1, self.columns.nlevels):\n raise ValueError(\n \"col_fill=None is incompatible \"\n f\"with incomplete column name {name}\"\n )\n col_fill = col_name[0]\n\n lev_num = self.columns._get_level_number(col_level)\n name_lst = [col_fill] * lev_num + col_name\n missing = self.columns.nlevels - len(name_lst)\n name_lst += [col_fill] * missing\n name = tuple(name_lst)\n\n # to ndarray and maybe infer different dtype\n level_values = lev._values\n if level_values.dtype == np.object_:\n level_values = lib.maybe_convert_objects(level_values)\n\n if lab is not None:\n # if we have the codes, extract the values with a mask\n level_values = algorithms.take(\n level_values, lab, allow_fill=True, fill_value=lev._na_value\n )\n\n new_obj.insert(\n 0,\n name,\n level_values,\n allow_duplicates=allow_duplicates,\n )\n\n new_obj.index = new_index\n if not inplace:\n return new_obj\n\n return None\n\n # ----------------------------------------------------------------------\n # Reindex-based selection methods\n\n def isna(self) -> DataFrame:\n \"\"\"\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n res_mgr = self._mgr.isna(func=isna)\n result = self._constructor_from_mgr(res_mgr, axes=res_mgr.axes)\n return result.__finalize__(self, method=\"isna\")\n\n def isnull(self) -> DataFrame:\n \"\"\"\n DataFrame.isnull is an alias for DataFrame.isna.\n\n Detect missing values.\n\n Return a boolean same-sized object indicating if the values are NA.\n NA values, such as None or :attr:`numpy.NaN`, gets mapped to True\n values.\n Everything else gets mapped to False values. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is an NA value.\n\n See Also\n --------\n Series.isnull : Alias of isna.\n DataFrame.isnull : Alias of isna.\n Series.notna : Boolean inverse of isna.\n DataFrame.notna : Boolean inverse of isna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n isna : Top-level isna.\n\n Examples\n --------\n Show which entries in a DataFrame are NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.isna()\n age born name toy\n 0 False True False True\n 1 False False False False\n 2 True False False False\n\n Show which entries in a Series are NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.isna()\n 0 False\n 1 False\n 2 True\n dtype: bool\n \"\"\"\n return self.isna()\n\n def notna(self) -> DataFrame:\n \"\"\"\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notna()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notna()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n def notnull(self) -> DataFrame:\n \"\"\"\n DataFrame.notnull is an alias for DataFrame.notna.\n\n Detect existing (non-missing) values.\n\n Return a boolean same-sized object indicating if the values are not NA.\n Non-missing values get mapped to True. Characters such as empty\n strings ``''`` or :attr:`numpy.inf` are not considered NA values.\n NA values, such as None or :attr:`numpy.NaN`, get mapped to False\n values.\n\n Returns\n -------\n Series/DataFrame\n Mask of bool values for each element in Series/DataFrame\n that indicates whether an element is not an NA value.\n\n See Also\n --------\n Series.notnull : Alias of notna.\n DataFrame.notnull : Alias of notna.\n Series.isna : Boolean inverse of notna.\n DataFrame.isna : Boolean inverse of notna.\n Series.dropna : Omit axes labels with missing values.\n DataFrame.dropna : Omit axes labels with missing values.\n notna : Top-level notna.\n\n Examples\n --------\n Show which entries in a DataFrame are not NA.\n\n >>> df = pd.DataFrame(\n ... dict(\n ... age=[5, 6, np.nan],\n ... born=[\n ... pd.NaT,\n ... pd.Timestamp(\"1939-05-27\"),\n ... pd.Timestamp(\"1940-04-25\"),\n ... ],\n ... name=[\"Alfred\", \"Batman\", \"\"],\n ... toy=[None, \"Batmobile\", \"Joker\"],\n ... )\n ... )\n >>> df\n age born name toy\n 0 5.0 NaT Alfred NaN\n 1 6.0 1939-05-27 Batman Batmobile\n 2 NaN 1940-04-25 Joker\n\n >>> df.notnull()\n age born name toy\n 0 True False True False\n 1 True True True True\n 2 False True True True\n\n Show which entries in a Series are not NA.\n\n >>> ser = pd.Series([5, 6, np.nan])\n >>> ser\n 0 5.0\n 1 6.0\n 2 NaN\n dtype: float64\n\n >>> ser.notnull()\n 0 True\n 1 True\n 2 False\n dtype: bool\n \"\"\"\n return ~self.isna()\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def dropna(\n self,\n *,\n axis: Axis = ...,\n how: AnyAll | lib.NoDefault = ...,\n thresh: int | lib.NoDefault = ...,\n subset: IndexLabel = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n def dropna(\n self,\n *,\n axis: Axis = 0,\n how: AnyAll | lib.NoDefault = lib.no_default,\n thresh: int | lib.NoDefault = lib.no_default,\n subset: IndexLabel | AnyArrayLike | None = None,\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Remove missing values.\n\n See the :ref:`User Guide ` for more on which values are\n considered missing, and how to work with missing data.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Determine if rows or columns which contain missing values are\n removed.\n\n * 0, or 'index' : Drop rows which contain missing values.\n * 1, or 'columns' : Drop columns which contain missing value.\n\n Only a single axis is allowed.\n\n how : {'any', 'all'}, default 'any'\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.\n\n * 'any' : If any NA values are present, drop that row or column.\n * 'all' : If all values are NA, drop that row or column.\n\n thresh : int, optional\n Require that many non-NA values. Cannot be combined with how.\n subset : column label or iterable of labels, optional\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n .. versionadded:: 2.0.0\n\n Returns\n -------\n DataFrame or None\n DataFrame with NA entries dropped from it or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.isna: Indicate missing values.\n DataFrame.notna : Indicate existing (non-missing) values.\n DataFrame.fillna : Replace missing values.\n Series.dropna : Drop missing values.\n Index.dropna : Drop missing indices.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"name\": [\"Alfred\", \"Batman\", \"Catwoman\"],\n ... \"toy\": [np.nan, \"Batmobile\", \"Bullwhip\"],\n ... \"born\": [pd.NaT, pd.Timestamp(\"1940-04-25\"), pd.NaT],\n ... }\n ... )\n >>> df\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Drop the rows where at least one element is missing.\n\n >>> df.dropna()\n name toy born\n 1 Batman Batmobile 1940-04-25\n\n Drop the columns where at least one element is missing.\n\n >>> df.dropna(axis=\"columns\")\n name\n 0 Alfred\n 1 Batman\n 2 Catwoman\n\n Drop the rows where all elements are missing.\n\n >>> df.dropna(how=\"all\")\n name toy born\n 0 Alfred NaN NaT\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Keep only the rows with at least 2 non-NA values.\n\n >>> df.dropna(thresh=2)\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n\n Define in which columns to look for missing values.\n\n >>> df.dropna(subset=[\"name\", \"toy\"])\n name toy born\n 1 Batman Batmobile 1940-04-25\n 2 Catwoman Bullwhip NaT\n \"\"\"\n if (how is not lib.no_default) and (thresh is not lib.no_default):\n raise TypeError(\n \"You cannot set both the how and thresh arguments at the same time.\"\n )\n\n if how is lib.no_default:\n how = \"any\"\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n if isinstance(axis, (tuple, list)):\n # GH20987\n raise TypeError(\"supplying multiple axes to axis is no longer supported.\")\n\n axis = self._get_axis_number(axis)\n agg_axis = 1 - axis\n\n agg_obj = self\n if subset is not None:\n # subset needs to be list\n if not is_list_like(subset):\n subset = [cast(\"Hashable\", subset)]\n ax = self._get_axis(agg_axis)\n indices = ax.get_indexer_for(subset) # type: ignore[arg-type]\n check = indices == -1\n if check.any():\n raise KeyError(np.array(subset)[check].tolist())\n agg_obj = self.take(indices, axis=agg_axis)\n\n if thresh is not lib.no_default:\n count = agg_obj.count(axis=agg_axis)\n mask = count >= thresh\n elif how == \"any\":\n # faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n # Cast to bool to avoid slow EA groupby fallback (GH#60179)\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.all(axis=agg_axis, bool_only=False)\n elif how == \"all\":\n # faster equivalent to 'agg_obj.count(agg_axis) > 0'\n notna_obj = notna(agg_obj)\n if agg_axis == 1:\n notna_obj = notna_obj.astype(bool)\n mask = notna_obj.any(axis=agg_axis, bool_only=False)\n else:\n raise ValueError(f\"invalid how option: {how}\")\n\n if np.all(mask):\n result = self.copy(deep=False)\n else:\n result = self.loc(axis=axis)[mask]\n\n if ignore_index:\n result.index = default_index(len(result))\n\n if not inplace:\n return result\n self._update_inplace(result)\n return None\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[True],\n ignore_index: bool = ...,\n ) -> None: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: Literal[False] = ...,\n ignore_index: bool = ...,\n ) -> DataFrame: ...\n\n @overload\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = ...,\n *,\n keep: DropKeep = ...,\n inplace: bool = ...,\n ignore_index: bool = ...,\n ) -> DataFrame | None: ...\n\n def drop_duplicates(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n *,\n keep: DropKeep = \"first\",\n inplace: bool = False,\n ignore_index: bool = False,\n ) -> DataFrame | None:\n \"\"\"\n Return DataFrame with duplicate rows removed.\n\n Considering certain columns is optional. Indexes, including time indexes\n are ignored.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', ``False``}, default 'first'\n Determines which duplicates (if any) to keep.\n\n - 'first' : Drop duplicates except for the first occurrence.\n - 'last' : Drop duplicates except for the last occurrence.\n - ``False`` : Drop all duplicates.\n\n inplace : bool, default ``False``\n Whether to modify the DataFrame rather than creating a new one.\n ignore_index : bool, default ``False``\n If ``True``, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n\n Returns\n -------\n DataFrame or None\n DataFrame with duplicates removed or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.value_counts: Count unique combinations of columns.\n\n Notes\n -----\n This method requires columns specified by ``subset`` to be of hashable type.\n Passing unhashable columns will raise a ``TypeError``.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, it removes duplicate rows based on all columns.\n\n >>> df.drop_duplicates()\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n To remove duplicates on specific column(s), use ``subset``.\n\n >>> df.drop_duplicates(subset=[\"brand\"])\n brand style rating\n 0 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n\n To remove duplicates and keep last occurrences, use ``keep``.\n\n >>> df.drop_duplicates(subset=[\"brand\", \"style\"], keep=\"last\")\n brand style rating\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 4 Indomie pack 5.0\n \"\"\"\n if self.empty:\n return self.copy(deep=False)\n\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n ignore_index = validate_bool_kwarg(ignore_index, \"ignore_index\")\n\n result = self[-self.duplicated(subset, keep=keep)]\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n self._update_inplace(result)\n return None\n else:\n return result\n\n def duplicated(\n self,\n subset: Hashable | Iterable[Hashable] | None = None,\n keep: DropKeep = \"first\",\n ) -> Series:\n \"\"\"\n Return boolean Series denoting duplicate rows.\n\n Considering certain columns is optional.\n\n Parameters\n ----------\n subset : column label or iterable of labels, optional\n Only consider certain columns for identifying duplicates, by\n default use all of the columns.\n keep : {'first', 'last', False}, default 'first'\n Determines which duplicates (if any) to mark.\n\n - ``first`` : Mark duplicates as ``True`` except for the first occurrence.\n - ``last`` : Mark duplicates as ``True`` except for the last occurrence.\n - False : Mark all duplicates as ``True``.\n\n Returns\n -------\n Series\n Boolean series for each duplicated rows.\n\n See Also\n --------\n Index.duplicated : Equivalent method on index.\n Series.duplicated : Equivalent method on Series.\n Series.drop_duplicates : Remove duplicate values from Series.\n DataFrame.drop_duplicates : Remove duplicate values from DataFrame.\n\n Examples\n --------\n Consider dataset containing ramen rating.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"brand\": [\"Yum Yum\", \"Yum Yum\", \"Indomie\", \"Indomie\", \"Indomie\"],\n ... \"style\": [\"cup\", \"cup\", \"cup\", \"pack\", \"pack\"],\n ... \"rating\": [4, 4, 3.5, 15, 5],\n ... }\n ... )\n >>> df\n brand style rating\n 0 Yum Yum cup 4.0\n 1 Yum Yum cup 4.0\n 2 Indomie cup 3.5\n 3 Indomie pack 15.0\n 4 Indomie pack 5.0\n\n By default, for each set of duplicated values, the first occurrence\n is set on False and all others on True.\n\n >>> df.duplicated()\n 0 False\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By using 'last', the last occurrence of each set of duplicated values\n is set on False and all others on True.\n\n >>> df.duplicated(keep=\"last\")\n 0 True\n 1 False\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n By setting ``keep`` on False, all duplicates are True.\n\n >>> df.duplicated(keep=False)\n 0 True\n 1 True\n 2 False\n 3 False\n 4 False\n dtype: bool\n\n To find duplicates on specific column(s), use ``subset``.\n\n >>> df.duplicated(subset=[\"brand\"])\n 0 False\n 1 True\n 2 False\n 3 True\n 4 True\n dtype: bool\n \"\"\"\n\n if self.empty:\n return self._constructor_sliced(False, dtype=bool, index=self.index)\n\n def f(vals) -> tuple[np.ndarray, int]:\n labels, shape = algorithms.factorize(vals, size_hint=len(self))\n return labels.astype(\"i8\"), len(shape)\n\n if subset is None:\n subset = self.columns\n elif (\n not np.iterable(subset)\n or isinstance(subset, str)\n or (isinstance(subset, tuple) and subset in self.columns)\n ):\n subset = (subset,)\n\n # needed for mypy since can't narrow types using np.iterable\n subset = cast(\"Sequence\", subset)\n\n # Verify all columns in subset exist in the queried dataframe\n # Otherwise, raise a KeyError, same as if you try to __getitem__ with a\n # key that doesn't exist.\n diff = set(subset) - set(self.columns)\n if diff:\n raise KeyError(Index(diff))\n\n if len(subset) == 1 and self.columns.is_unique:\n # GH#45236 This is faster than get_group_index below\n result = self[next(iter(subset))].duplicated(keep)\n result.name = None\n else:\n vals = (col.values for name, col in self.items() if name in subset)\n labels, shape = map(list, zip(*map(f, vals), strict=True))\n\n ids = get_group_index(labels, tuple(shape), sort=False, xnull=False)\n result = self._constructor_sliced(duplicated(ids, keep), index=self.index)\n return result.__finalize__(self, method=\"duplicated\")\n\n # ----------------------------------------------------------------------\n # Sorting\n # error: Signature of \"sort_values\" incompatible with supertype \"NDFrame\"\n @overload # type: ignore[override]\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = ...,\n ascending=...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: str = ...,\n ignore_index: bool = ...,\n key: ValueKeyFunc = ...,\n ) -> None: ...\n\n def sort_values(\n self,\n by: IndexLabel,\n *,\n axis: Axis = 0,\n ascending: bool | list[bool] | tuple[bool, ...] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: str = \"last\",\n ignore_index: bool = False,\n key: ValueKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort by the values along either axis.\n\n This method sorts the DataFrame by the values in one or more columns\n or by index/column labels.\n\n Parameters\n ----------\n by : str or list of str\n Name or list of names to sort by.\n\n - if `axis` is 0 or `'index'` then `by` may contain index\n levels and/or column labels.\n - if `axis` is 1 or `'columns'` then `by` may contain column\n levels and/or index labels.\n axis : \"{0 or 'index', 1 or 'columns'}\", default 0\n Axis to be sorted.\n ascending : bool or list of bool, default True\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.\n inplace : bool, default False\n If True, perform operation in-place.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n end.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n Apply the key function to the values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect a\n ``Series`` and return a Series with the same shape as the input.\n It will be applied to each column in `by` independently. The values in the\n returned Series will be used as the keys for sorting.\n\n Returns\n -------\n DataFrame or None\n DataFrame with sorted values or None if ``inplace=True``.\n\n See Also\n --------\n DataFrame.sort_index : Sort a DataFrame by the index.\n Series.sort_values : Similar method for a Series.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"A\", \"A\", \"B\", np.nan, \"D\", \"C\"],\n ... \"col2\": [2, 1, 9, 8, 7, 4],\n ... \"col3\": [0, 1, 9, 4, 2, 3],\n ... \"col4\": [\"a\", \"B\", \"c\", \"D\", \"e\", \"F\"],\n ... }\n ... )\n >>> df\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n **Sort by a single column**\n\n In this case, we are sorting the rows according to values in ``col1``:\n\n >>> df.sort_values(by=[\"col1\"])\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort by multiple columns**\n\n You can also provide multiple columns to ``by`` argument, as shown below.\n In this example, the rows are first sorted according to ``col1``, and then\n the rows that have an identical value in ``col1`` are sorted according\n to ``col2``.\n\n >>> df.sort_values(by=[\"col1\", \"col2\"])\n col1 col2 col3 col4\n 1 A 1 1 B\n 0 A 2 0 a\n 2 B 9 9 c\n 5 C 4 3 F\n 4 D 7 2 e\n 3 NaN 8 4 D\n\n **Sort in a descending order**\n\n The sort order can be reversed using ``ascending`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False)\n col1 col2 col3 col4\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n 3 NaN 8 4 D\n\n **Placing any** ``NA`` **first**\n\n Note that in the above example, the rows that contain an ``NA`` value in their\n ``col1`` are placed at the end of the dataframe. This behavior can be modified\n via ``na_position`` argument, as shown below:\n\n >>> df.sort_values(by=\"col1\", ascending=False, na_position=\"first\")\n col1 col2 col3 col4\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n 2 B 9 9 c\n 0 A 2 0 a\n 1 A 1 1 B\n\n **Customized sort order**\n\n The ``key`` argument allows for a further customization of sorting behaviour.\n For example, you may want\n to ignore the `letter's case `__\n when sorting strings:\n\n >>> df.sort_values(by=\"col4\", key=lambda col: col.str.lower())\n col1 col2 col3 col4\n 0 A 2 0 a\n 1 A 1 1 B\n 2 B 9 9 c\n 3 NaN 8 4 D\n 4 D 7 2 e\n 5 C 4 3 F\n\n Another typical example is\n `natural sorting `__.\n This can be done using\n ``natsort`` `package `__,\n which provides a function to generate a key\n to sort data in their natural order:\n\n >>> df = pd.DataFrame(\n ... {\n ... \"hours\": [\"0hr\", \"128hr\", \"0hr\", \"64hr\", \"64hr\", \"128hr\"],\n ... \"mins\": [\n ... \"10mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"40mins\",\n ... \"10mins\",\n ... \"10mins\",\n ... ],\n ... \"value\": [10, 20, 30, 40, 50, 60],\n ... }\n ... )\n >>> df\n hours mins value\n 0 0hr 10mins 10\n 1 128hr 40mins 20\n 2 0hr 40mins 30\n 3 64hr 40mins 40\n 4 64hr 10mins 50\n 5 128hr 10mins 60\n >>> from natsort import natsort_keygen\n >>> df.sort_values(\n ... by=[\"hours\", \"mins\"],\n ... key=natsort_keygen(),\n ... )\n hours mins value\n 0 0hr 10mins 10\n 2 0hr 40mins 30\n 4 64hr 10mins 50\n 3 64hr 40mins 40\n 5 128hr 10mins 60\n 1 128hr 40mins 20\n \"\"\"\n inplace = validate_bool_kwarg(inplace, \"inplace\")\n axis = self._get_axis_number(axis)\n ascending = validate_ascending(ascending)\n if not isinstance(by, list):\n by = [by]\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool, List[bool]]\";\n # expected \"Sized\"\n if is_sequence(ascending) and (\n len(by) != len(ascending) # type: ignore[arg-type]\n ):\n # error: Argument 1 to \"len\" has incompatible type \"Union[bool,\n # List[bool]]\"; expected \"Sized\"\n raise ValueError(\n f\"Length of ascending ({len(ascending)})\" # type: ignore[arg-type]\n f\" != length of by ({len(by)})\"\n )\n if len(by) > 1:\n keys = (self._get_label_or_level_values(x, axis=axis) for x in by)\n\n # need to rewrap columns in Series to apply key function\n if key is not None:\n keys_data = [\n Series(k, name=name) for (k, name) in zip(keys, by, strict=True)\n ]\n else:\n # error: Argument 1 to \"list\" has incompatible type\n # \"Generator[ExtensionArray | ndarray[Any, Any], None, None]\";\n # expected \"Iterable[Series]\"\n keys_data = list(keys) # type: ignore[arg-type]\n\n indexer = lexsort_indexer(\n keys_data, orders=ascending, na_position=na_position, key=key\n )\n elif by:\n # len(by) == 1\n\n k = self._get_label_or_level_values(by[0], axis=axis)\n\n # need to rewrap column in Series to apply key function\n if key is not None:\n # error: Incompatible types in assignment (expression has type\n # \"Series\", variable has type \"ndarray\")\n k = Series(k, name=by[0]) # type: ignore[assignment]\n\n if isinstance(ascending, (tuple, list)):\n ascending = ascending[0]\n\n indexer = nargsort(\n k, kind=kind, ascending=ascending, na_position=na_position, key=key\n )\n elif inplace:\n return self._update_inplace(self)\n else:\n return self.copy(deep=False)\n\n if is_range_indexer(indexer, len(indexer)):\n result = self.copy(deep=False)\n if ignore_index:\n result.index = default_index(len(result))\n\n if inplace:\n return self._update_inplace(result)\n else:\n return result\n\n new_data = self._mgr.take(\n indexer, axis=self._get_block_manager_axis(axis), verify=False\n )\n\n if ignore_index:\n new_data.set_axis(\n self._get_block_manager_axis(axis), default_index(len(indexer))\n )\n\n result = self._constructor_from_mgr(new_data, axes=new_data.axes)\n if inplace:\n return self._update_inplace(result)\n else:\n return result.__finalize__(self, method=\"sort_values\")\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[True],\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> None: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: Literal[False] = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame: ...\n\n @overload\n def sort_index(\n self,\n *,\n axis: Axis = ...,\n level: IndexLabel = ...,\n ascending: bool | Sequence[bool] = ...,\n inplace: bool = ...,\n kind: SortKind = ...,\n na_position: NaPosition = ...,\n sort_remaining: bool = ...,\n ignore_index: bool = ...,\n key: IndexKeyFunc = ...,\n ) -> DataFrame | None: ...\n\n def sort_index(\n self,\n *,\n axis: Axis = 0,\n level: IndexLabel | None = None,\n ascending: bool | Sequence[bool] = True,\n inplace: bool = False,\n kind: SortKind = \"quicksort\",\n na_position: NaPosition = \"last\",\n sort_remaining: bool = True,\n ignore_index: bool = False,\n key: IndexKeyFunc | None = None,\n ) -> DataFrame | None:\n \"\"\"\n Sort object by labels (along an axis).\n\n Returns a new DataFrame sorted by label if `inplace` argument is\n ``False``, otherwise updates the original DataFrame and returns None.\n\n Parameters\n ----------\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis along which to sort. The value 0 identifies the rows,\n and 1 identifies the columns.\n level : int or level name or list of ints or list of level names\n If not None, sort on values in specified index level(s).\n ascending : bool or list-like of bools, default True\n Sort ascending vs. descending. When the index is a MultiIndex the\n sort direction can be controlled for each level individually.\n inplace : bool, default False\n Whether to modify the DataFrame rather than creating a new one.\n kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'\n Choice of sorting algorithm. See also :func:`numpy.sort` for more\n information. `mergesort` and `stable` are the only stable algorithms. For\n DataFrames, this option is only applied when sorting on a single\n column or label.\n na_position : {'first', 'last'}, default 'last'\n Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.\n Not implemented for MultiIndex.\n sort_remaining : bool, default True\n If True and sorting by level and index is multilevel, sort by other\n levels too (in order) after sorting by specified level.\n ignore_index : bool, default False\n If True, the resulting axis will be labeled 0, 1, \u2026, n - 1.\n key : callable, optional\n If not None, apply the key function to the index values\n before sorting. This is similar to the `key` argument in the\n builtin :meth:`sorted` function, with the notable difference that\n this `key` function should be *vectorized*. It should expect an\n ``Index`` and return an ``Index`` of the same shape. For MultiIndex\n inputs, the key is applied *per level*.\n\n Returns\n -------\n DataFrame or None\n The original DataFrame sorted by the labels or None if ``inplace=True``.\n\n See Also\n --------\n Series.sort_index : Sort Series by the index.\n DataFrame.sort_values : Sort DataFrame by the value.\n Series.sort_values : Sort Series by the value.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... [1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=[\"A\"]\n ... )\n >>> df.sort_index()\n A\n 1 4\n 29 2\n 100 1\n 150 5\n 234 3\n\n By default, it sorts in ascending order, to sort in descending order,\n use ``ascending=False``\n\n >>> df.sort_index(ascending=False)\n A\n 234 3\n 150 5\n 100 1\n 29 2\n 1 4\n\n A key function can be specified which is applied to the index before\n sorting. For a ``MultiIndex`` this is applied to each level separately.\n\n >>> df = pd.DataFrame({\"a\": [1, 2, 3, 4]}, index=[\"A\", \"b\", \"C\", \"d\"])\n >>> df.sort_index(key=lambda x: x.str.lower())\n a\n A 1\n b 2\n C 3\n d 4\n \"\"\"\n return super().sort_index(\n axis=axis,\n level=level,\n ascending=ascending,\n inplace=inplace,\n kind=kind,\n na_position=na_position,\n sort_remaining=sort_remaining,\n ignore_index=ignore_index,\n key=key,\n )\n\n def value_counts(\n self,\n subset: IndexLabel | None = None,\n normalize: bool = False,\n sort: bool = True,\n ascending: bool = False,\n dropna: bool = True,\n ) -> Series:\n \"\"\"\n Return a Series containing the frequency of each distinct row in the DataFrame.\n\n The resulting Series is indexed by the unique row combinations found\n in the DataFrame (or the specified subset of columns). By default the\n counts are sorted in descending order, and rows with ``NaN`` values\n in any column are excluded.\n\n Parameters\n ----------\n subset : Hashable or a sequence of the previous, optional\n Columns to use when counting unique combinations.\n normalize : bool, default False\n Return proportions rather than frequencies.\n sort : bool, default True\n Stable sort by frequencies when True. Preserve the order of the data\n when False.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, ``sort=False`` would sort by the columns values.\n\n .. versionchanged:: 3.0.0\n\n Prior to 3.0.0, the sort was unstable.\n ascending : bool, default False\n Sort in ascending order.\n dropna : bool, default True\n Do not include counts of rows that contain NA values.\n\n Returns\n -------\n Series\n Series containing the frequency of each distinct row in the DataFrame.\n\n See Also\n --------\n Series.value_counts: Equivalent method on Series.\n\n Notes\n -----\n The returned Series will have a MultiIndex with one level per input\n column but an Index (non-multi) for a single label. By default, rows\n that contain any NA values are omitted from the result. By default,\n the resulting Series will be sorted by frequencies in descending order so that\n the first element is the most frequently-occurring row.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"num_legs\": [2, 4, 4, 6], \"num_wings\": [2, 0, 0, 0]},\n ... index=[\"falcon\", \"dog\", \"cat\", \"ant\"],\n ... )\n >>> df\n num_legs num_wings\n falcon 2 2\n dog 4 0\n cat 4 0\n ant 6 0\n\n >>> df.value_counts()\n num_legs num_wings\n 4 0 2\n 2 2 1\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(sort=False)\n num_legs num_wings\n 2 2 1\n 4 0 2\n 6 0 1\n Name: count, dtype: int64\n\n >>> df.value_counts(ascending=True)\n num_legs num_wings\n 2 2 1\n 6 0 1\n 4 0 2\n Name: count, dtype: int64\n\n >>> df.value_counts(normalize=True)\n num_legs num_wings\n 4 0 0.50\n 2 2 0.25\n 6 0 0.25\n Name: proportion, dtype: float64\n\n With `dropna` set to `False` we can also count rows with NA values.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"first_name\": [\"John\", \"Anne\", \"John\", \"Beth\"],\n ... \"middle_name\": [\"Smith\", pd.NA, pd.NA, \"Louise\"],\n ... }\n ... )\n >>> df\n first_name middle_name\n 0 John Smith\n 1 Anne NaN\n 2 John NaN\n 3 Beth Louise\n\n >>> df.value_counts()\n first_name middle_name\n John Smith 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(dropna=False)\n first_name middle_name\n John Smith 1\n Anne NaN 1\n John NaN 1\n Beth Louise 1\n Name: count, dtype: int64\n\n >>> df.value_counts(\"first_name\")\n first_name\n John 2\n Anne 1\n Beth 1\n Name: count, dtype: int64\n \"\"\"\n if subset is None:\n subset = self.columns.tolist()\n\n name = \"proportion\" if normalize else \"count\"\n counts = self.groupby(\n subset, sort=False, dropna=dropna, observed=False\n )._grouper.size()\n counts.name = name\n\n if sort:\n counts = counts.sort_values(ascending=ascending, kind=\"stable\")\n if normalize:\n counts /= counts.sum()\n\n # Force MultiIndex for a list_like subset with a single column\n if is_list_like(subset) and len(subset) == 1: # type: ignore[arg-type]\n counts.index = MultiIndex.from_arrays(\n [counts.index], names=[counts.index.name]\n )\n\n return counts\n\n def nlargest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in descending order.\n\n Return the first `n` rows with the largest values in `columns`, in\n descending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=False).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of rows to return.\n columns : Hashable or a sequence of the previous\n Column label(s) to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : prioritize the first occurrence(s)\n - ``last`` : prioritize the last occurrence(s)\n - ``all`` : keep all the ties of the smallest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n The first `n` rows ordered by the given columns in descending\n order.\n\n See Also\n --------\n DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in\n ascending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Notes\n -----\n This function cannot be used with all column types. For example, when\n specifying columns with `object` or `category` dtypes, ``TypeError`` is\n raised.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 11300 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nlargest`` to select the three\n rows having the largest values in column \"population\".\n\n >>> df.nlargest(3, \"population\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nlargest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the smallest element, all the\n ties are kept:\n\n >>> df.nlargest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n However, ``nlargest`` does not keep ``n`` distinct largest elements:\n\n >>> df.nlargest(5, \"population\", keep=\"all\")\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n\n To order by the largest values in column \"population\" and then \"GDP\",\n we can specify multiple columns like in the next example.\n\n >>> df.nlargest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n France 65000000 2583560 FR\n Italy 59000000 1937894 IT\n Brunei 434000 12128 BN\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest()\n\n def nsmallest(\n self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = \"first\"\n ) -> DataFrame:\n \"\"\"\n Return the first `n` rows ordered by `columns` in ascending order.\n\n Return the first `n` rows with the smallest values in `columns`, in\n ascending order. The columns that are not specified are returned as\n well, but not used for ordering.\n\n This method is equivalent to\n ``df.sort_values(columns, ascending=True).head(n)``, but more\n performant.\n\n Parameters\n ----------\n n : int\n Number of items to retrieve.\n columns : list or str\n Column name or names to order by.\n keep : {'first', 'last', 'all'}, default 'first'\n Where there are duplicate values:\n\n - ``first`` : take the first occurrence.\n - ``last`` : take the last occurrence.\n - ``all`` : keep all the ties of the largest item even if it means\n selecting more than ``n`` items.\n\n Returns\n -------\n DataFrame\n DataFrame with the first `n` rows ordered by `columns` in ascending order.\n\n See Also\n --------\n DataFrame.nlargest : Return the first `n` rows ordered by `columns` in\n descending order.\n DataFrame.sort_values : Sort DataFrame by the values.\n DataFrame.head : Return the first `n` rows without re-ordering.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"population\": [\n ... 59000000,\n ... 65000000,\n ... 434000,\n ... 434000,\n ... 434000,\n ... 337000,\n ... 337000,\n ... 11300,\n ... 11300,\n ... ],\n ... \"GDP\": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],\n ... \"alpha-2\": [\"IT\", \"FR\", \"MT\", \"MV\", \"BN\", \"IS\", \"NR\", \"TV\", \"AI\"],\n ... },\n ... index=[\n ... \"Italy\",\n ... \"France\",\n ... \"Malta\",\n ... \"Maldives\",\n ... \"Brunei\",\n ... \"Iceland\",\n ... \"Nauru\",\n ... \"Tuvalu\",\n ... \"Anguilla\",\n ... ],\n ... )\n >>> df\n population GDP alpha-2\n Italy 59000000 1937894 IT\n France 65000000 2583560 FR\n Malta 434000 12011 MT\n Maldives 434000 4520 MV\n Brunei 434000 12128 BN\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n\n In the following example, we will use ``nsmallest`` to select the\n three rows having the smallest values in column \"population\".\n\n >>> df.nsmallest(3, \"population\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n\n When using ``keep='last'``, ties are resolved in reverse order:\n\n >>> df.nsmallest(3, \"population\", keep=\"last\")\n population GDP alpha-2\n Anguilla 11300 311 AI\n Tuvalu 11300 38 TV\n Nauru 337000 182 NR\n\n When using ``keep='all'``, the number of element kept can go beyond ``n``\n if there are duplicate values for the largest element, all the\n ties are kept.\n\n >>> df.nsmallest(3, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n However, ``nsmallest`` does not keep ``n`` distinct\n smallest elements:\n\n >>> df.nsmallest(4, \"population\", keep=\"all\")\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Iceland 337000 17036 IS\n Nauru 337000 182 NR\n\n To order by the smallest values in column \"population\" and then \"GDP\", we can\n specify multiple columns like in the next example.\n\n >>> df.nsmallest(3, [\"population\", \"GDP\"])\n population GDP alpha-2\n Tuvalu 11300 38 TV\n Anguilla 11300 311 AI\n Nauru 337000 182 NR\n \"\"\"\n return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest()\n\n def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:\n \"\"\"\n Swap levels i and j in a :class:`MultiIndex`.\n\n Default is to swap the two innermost levels of the index.\n\n Parameters\n ----------\n i, j : int or str\n Levels of the indices to be swapped. Can pass level name as string.\n axis : {0 or 'index', 1 or 'columns'}, default 0\n The axis to swap levels on. 0 or 'index' for row-wise, 1 or\n 'columns' for column-wise.\n\n Returns\n -------\n DataFrame\n DataFrame with levels swapped in MultiIndex.\n\n See Also\n --------\n DataFrame.reorder_levels: Reorder levels of MultiIndex.\n DataFrame.sort_index: Sort MultiIndex.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"Grade\": [\"A\", \"B\", \"A\", \"C\"]},\n ... index=[\n ... [\"Final exam\", \"Final exam\", \"Coursework\", \"Coursework\"],\n ... [\"History\", \"Geography\", \"History\", \"Geography\"],\n ... [\"January\", \"February\", \"March\", \"April\"],\n ... ],\n ... )\n >>> df\n Grade\n Final exam History January A\n Geography February B\n Coursework History March A\n Geography April C\n\n In the following example, we will swap the levels of the indices.\n Here, we will swap the levels column-wise, but levels can be swapped row-wise\n in a similar manner. Note that column-wise is the default behaviour.\n By not supplying any arguments for i and j, we swap the last and second to\n last indices.\n\n >>> df.swaplevel()\n Grade\n Final exam January History A\n February Geography B\n Coursework March History A\n April Geography C\n\n By supplying one argument, we can choose which index to swap the last\n index with. We can for example swap the first index with the last one as\n follows.\n\n >>> df.swaplevel(0)\n Grade\n January History Final exam A\n February Geography Final exam B\n March History Coursework A\n April Geography Coursework C\n\n We can also define explicitly which indices we want to swap by supplying values\n for both i and j. Here, we for example swap the first and second indices.\n\n >>> df.swaplevel(0, 1)\n Grade\n History Final exam January A\n Geography Final exam February B\n History Coursework March A\n Geography Coursework April C\n \"\"\"\n result = self.copy(deep=False)\n\n axis = self._get_axis_number(axis)\n\n if not isinstance(result._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only swap levels on a hierarchical axis.\")\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.swaplevel(i, j)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.swaplevel(i, j)\n return result\n\n def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:\n \"\"\"\n Rearrange index or column levels using input ``order``.\n\n May not drop or duplicate levels.\n\n Parameters\n ----------\n order : list of int or list of str\n List representing new level order. Reference level by number\n (position) or by key (label).\n axis : {0 or 'index', 1 or 'columns'}, default 0\n Where to reorder levels.\n\n Returns\n -------\n DataFrame\n DataFrame with indices or columns with reordered levels.\n\n See Also\n --------\n DataFrame.swaplevel : Swap levels i and j in a MultiIndex.\n\n Examples\n --------\n >>> data = {\n ... \"class\": [\"Mammals\", \"Mammals\", \"Reptiles\"],\n ... \"diet\": [\"Omnivore\", \"Carnivore\", \"Carnivore\"],\n ... \"species\": [\"Humans\", \"Dogs\", \"Snakes\"],\n ... }\n >>> df = pd.DataFrame(data, columns=[\"class\", \"diet\", \"species\"])\n >>> df = df.set_index([\"class\", \"diet\"])\n >>> df\n species\n class diet\n Mammals Omnivore Humans\n Carnivore Dogs\n Reptiles Carnivore Snakes\n\n Let's reorder the levels of the index:\n\n >>> df.reorder_levels([\"diet\", \"class\"])\n species\n diet class\n Omnivore Mammals Humans\n Carnivore Mammals Dogs\n Reptiles Snakes\n \"\"\"\n axis = self._get_axis_number(axis)\n if not isinstance(self._get_axis(axis), MultiIndex): # pragma: no cover\n raise TypeError(\"Can only reorder levels on a hierarchical axis.\")\n\n result = self.copy(deep=False)\n\n if axis == 0:\n assert isinstance(result.index, MultiIndex)\n result.index = result.index.reorder_levels(order)\n else:\n assert isinstance(result.columns, MultiIndex)\n result.columns = result.columns.reorder_levels(order)\n return result\n\n # ----------------------------------------------------------------------\n # Arithmetic Methods\n\n def _cmp_method(self, other, op) -> DataFrame:\n axis: Literal[1] = 1 # only relevant for Series other case\n\n self, other = self._align_for_op(other, axis, flex=False, level=None)\n\n # See GH#4537 for discussion of scalar op behavior\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def _arith_method(self, other, op) -> DataFrame:\n if self._should_reindex_frame_op(other, op, 1, None, None):\n return self._arith_method_with_reindex(other, op)\n\n axis: Literal[1] = 1 # only relevant for Series other case\n other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))\n\n self, other = self._align_for_op(other, axis, flex=True, level=None)\n\n with np.errstate(all=\"ignore\"):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n _logical_method = _arith_method\n\n def _dispatch_frame_op(\n self, right, func: Callable, axis: AxisInt | None = None\n ) -> DataFrame:\n \"\"\"\n Evaluate the frame operation func(left, right) by evaluating\n column-by-column, dispatching to the Series implementation.\n\n Parameters\n ----------\n right : scalar, Series, or DataFrame\n func : arithmetic or comparison operator\n axis : {None, 0, 1}\n\n Returns\n -------\n DataFrame\n\n Notes\n -----\n Caller is responsible for setting np.errstate where relevant.\n \"\"\"\n # Get the appropriate array-op to apply to each column/block's values.\n array_op = ops.get_array_op(func)\n\n right = lib.item_from_zerodim(right)\n if not is_list_like(right):\n # i.e. scalar, faster than checking np.ndim(right) == 0\n bm = self._mgr.apply(array_op, right=right)\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, DataFrame):\n assert self.index.equals(right.index)\n assert self.columns.equals(right.columns)\n # TODO: The previous assertion `assert right._indexed_same(self)`\n # fails in cases with empty columns reached via\n # _frame_arith_method_with_reindex\n\n bm = self._mgr.operate_blockwise(\n right._mgr,\n array_op,\n )\n return self._constructor_from_mgr(bm, axes=bm.axes)\n\n elif isinstance(right, Series) and axis == 1:\n # axis=1 means we want to operate row-by-row\n assert right.index.equals(self.columns)\n\n right = right._values\n # maybe_align_as_frame ensures we do not have an ndarray here\n assert not isinstance(right, np.ndarray)\n\n arrays = [\n array_op(_left, _right)\n for _left, _right in zip(self._iter_column_arrays(), right, strict=True)\n ]\n\n elif isinstance(right, Series):\n assert right.index.equals(self.index)\n right = right._values\n\n arrays = [array_op(left, right) for left in self._iter_column_arrays()]\n\n else:\n raise NotImplementedError(right)\n\n return type(self)._from_arrays(\n arrays, self.columns, self.index, verify_integrity=False\n )\n\n def _combine_frame(self, other: DataFrame, func, fill_value=None):\n # at this point we have `self._indexed_same(other)`\n\n if fill_value is None:\n # since _arith_op may be called in a loop, avoid function call\n # overhead if possible by doing this check once\n _arith_op = func\n\n else:\n\n def _arith_op(left, right):\n # for the mixed_type case where we iterate over columns,\n # _arith_op(left, right) is equivalent to\n # left._binop(right, func, fill_value=fill_value)\n left, right = ops.fill_binop(left, right, fill_value)\n return func(left, right)\n\n new_data = self._dispatch_frame_op(other, _arith_op)\n return new_data\n\n def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:\n \"\"\"\n For DataFrame-with-DataFrame operations that require reindexing,\n operate only on shared columns, then reindex.\n\n Parameters\n ----------\n right : DataFrame\n op : binary operator\n\n Returns\n -------\n DataFrame\n \"\"\"\n left = self\n\n # GH#31623, only operate on shared columns\n cols, lcol_indexer, rcol_indexer = left.columns.join(\n right.columns, how=\"inner\", return_indexers=True\n )\n\n new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]\n new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]\n\n # GH#60498 For MultiIndex column alignment\n if isinstance(cols, MultiIndex):\n # When overwriting column names, make a shallow copy so as to not modify\n # the input DFs\n new_left = new_left.copy(deep=False)\n new_right = new_right.copy(deep=False)\n new_left.columns = cols\n new_right.columns = cols\n\n result = op(new_left, new_right)\n\n # Do the join on the columns instead of using left._align_for_op\n # to avoid constructing two potentially large/sparse DataFrames\n join_columns = left.columns.join(right.columns, how=\"outer\")\n\n if result.columns.has_duplicates:\n # Avoid reindexing with a duplicate axis.\n # https://github.com/pandas-dev/pandas/issues/35194\n indexer, _ = result.columns.get_indexer_non_unique(join_columns)\n indexer = algorithms.unique1d(indexer)\n result = result._reindex_with_indexers(\n {1: [join_columns, indexer]}, allow_dups=True\n )\n else:\n result = result.reindex(join_columns, axis=1)\n\n return result\n\n def _should_reindex_frame_op(self, right, op, axis: int, fill_value, level) -> bool:\n \"\"\"\n Check if this is an operation between DataFrames that will need to reindex.\n \"\"\"\n\n if level is not None:\n return False\n\n if op is operator.pow or op is roperator.rpow:\n # GH#32685 pow has special semantics for operating with null values\n return False\n\n if not isinstance(right, DataFrame):\n return False\n\n if (\n (\n isinstance(self.columns, MultiIndex)\n or isinstance(right.columns, MultiIndex)\n )\n and not self.columns.equals(right.columns)\n and fill_value is None\n ):\n # GH#60498 Reindex if MultiIndex columns are not matching\n # GH#60903 Don't reindex if fill_value is provided\n return True\n\n if fill_value is None and level is None and axis == 1:\n # TODO: any other cases we should handle here?\n\n # Intersection is always unique so we have to check the unique columns\n left_uniques = self.columns.unique()\n right_uniques = right.columns.unique()\n cols = left_uniques.intersection(right_uniques)\n if len(cols) and not (\n len(cols) == len(left_uniques) and len(cols) == len(right_uniques)\n ):\n # TODO: is there a shortcut available when len(cols) == 0?\n return True\n\n return False\n\n def _align_for_op(\n self,\n other,\n axis: AxisInt,\n flex: bool | None = False,\n level: Level | None = None,\n ):\n \"\"\"\n Convert rhs to meet lhs dims if input is list, tuple or np.ndarray.\n\n Parameters\n ----------\n other : Any\n axis : int\n flex : bool or None, default False\n Whether this is a flex op, in which case we reindex.\n None indicates not to check for alignment.\n level : int or level name, default None\n\n Returns\n -------\n left : DataFrame\n right : Any\n \"\"\"\n left, right = self, other\n\n def to_series(right):\n msg = (\n \"Unable to coerce to Series, \"\n \"length must be {req_len}: given {given_len}\"\n )\n\n # pass dtype to avoid doing inference, which would break consistency\n # with Index/Series ops\n dtype = None\n if getattr(right, \"dtype\", None) == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if axis == 0:\n if len(left.index) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.index), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.index, dtype=dtype)\n else:\n if len(left.columns) != len(right):\n raise ValueError(\n msg.format(req_len=len(left.columns), given_len=len(right))\n )\n right = left._constructor_sliced(right, index=left.columns, dtype=dtype)\n return right\n\n if isinstance(right, np.ndarray):\n if right.ndim == 1:\n right = to_series(right)\n\n elif right.ndim == 2:\n # We need to pass dtype=right.dtype to retain object dtype\n # otherwise we lose consistency with Index and array ops\n dtype = None\n if right.dtype == object:\n # can't pass right.dtype unconditionally as that would break on e.g.\n # datetime64[h] ndarray\n dtype = object\n\n if right.shape == left.shape:\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[0] == left.shape[0] and right.shape[1] == 1:\n # Broadcast across columns\n right = np.broadcast_to(right, left.shape)\n right = left._constructor(\n right, index=left.index, columns=left.columns, dtype=dtype\n )\n\n elif right.shape[1] == left.shape[1] and right.shape[0] == 1:\n # Broadcast along rows\n right = to_series(right[0, :])\n\n else:\n raise ValueError(\n \"Unable to coerce to DataFrame, shape \"\n f\"must be {left.shape}: given {right.shape}\"\n )\n\n elif right.ndim > 2:\n raise ValueError(\n \"Unable to coerce to Series/DataFrame, \"\n f\"dimension must be <= 2: {right.shape}\"\n )\n\n elif is_list_like(right) and not isinstance(right, (Series, DataFrame)):\n if not isinstance(\n right, (np.ndarray, ExtensionArray, Index, list, dict)\n ) and not ops.has_castable_attr(right):\n warnings.warn(\n f\"Operation with {type(right).__name__} is deprecated. \"\n \"In a future version these will be treated as scalar-like. \"\n \"To retain the old behavior, explicitly wrap in a Series \"\n \"instead.\",\n Pandas4Warning,\n stacklevel=find_stack_level(),\n )\n\n # GH#36702. Raise when attempting arithmetic with list of array-like.\n if any(is_array_like(el) for el in right):\n raise ValueError(\n f\"Unable to coerce list of {type(right[0])} to Series/DataFrame\"\n )\n # GH#17901\n right = to_series(right)\n\n if flex is not None and isinstance(right, DataFrame):\n if not left._indexed_same(right):\n if flex:\n left, right = left.align(right, join=\"outer\", level=level)\n else:\n raise ValueError(\n \"Can only compare identically-labeled (both index and columns) \"\n \"DataFrame objects\"\n )\n elif isinstance(right, Series):\n # axis=1 is default for DataFrame-with-Series op\n axis = axis if axis is not None else 1\n if not flex:\n if not left.axes[axis].equals(right.index):\n raise ValueError(\n \"Operands are not aligned. Do \"\n \"`left, right = left.align(right, axis=1)` \"\n \"before operating.\"\n )\n\n left, right = left.align(\n right,\n join=\"outer\",\n axis=axis,\n level=level,\n )\n right = left._maybe_align_series_as_frame(right, axis)\n return left, right\n\n def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):\n \"\"\"\n If the Series operand is not EA-dtype, we can broadcast to 2D and operate\n blockwise.\n \"\"\"\n rvalues = series._values\n if lib.is_np_dtype(rvalues.dtype):\n # We can losslessly+cheaply cast to ndarray\n # i.e. ndarray or dt64[naive], td64\n # TODO(EA2D): no need to special case with 2D EAs\n rvalues = np.asarray(rvalues)\n\n if axis == 0:\n rvalues = rvalues.reshape(-1, 1)\n else:\n rvalues = rvalues.reshape(1, -1)\n\n rvalues = np.broadcast_to(rvalues, self.shape)\n # pass dtype to avoid doing inference\n # copy=False is safe because this is a temporary DataFrame used only\n # as the right operand in blockwise arithmetic.\n df = self._constructor(\n rvalues,\n index=self.index,\n columns=self.columns,\n dtype=rvalues.dtype,\n copy=False,\n )\n # GH#61581\n elif axis == 0:\n df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))\n else:\n nrows = self.shape[0]\n df = DataFrame(\n {i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},\n dtype=rvalues.dtype,\n )\n df.index = self.index\n df.columns = self.columns\n return df.__finalize__(series)\n\n def _flex_arith_method(\n self, other, op, *, axis: Axis = \"columns\", level=None, fill_value=None\n ):\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n if self._should_reindex_frame_op(other, op, axis, fill_value, level):\n return self._arith_method_with_reindex(other, op)\n\n other = ops.maybe_prepare_scalar_for_op(other, self.shape)\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n with np.errstate(all=\"ignore\"):\n if isinstance(other, DataFrame):\n # Another DataFrame\n new_data = self._combine_frame(other, op, fill_value)\n\n elif isinstance(other, Series):\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n else:\n # in this case we always have `np.ndim(other) == 0`\n if fill_value is not None:\n self = self.fillna(fill_value)\n\n new_data = self._dispatch_frame_op(other, op)\n\n return self._construct_result(new_data, other=other)\n\n def _construct_result(self, result, other) -> DataFrame:\n \"\"\"\n Wrap the result of an arithmetic, comparison, or logical operation.\n\n Parameters\n ----------\n result : DataFrame\n\n Returns\n -------\n DataFrame\n \"\"\"\n out = self._constructor(result, copy=False).__finalize__(self)\n # Pin columns instead of passing to constructor for compat with\n # non-unique columns case\n out.columns = self.columns\n out.index = self.index\n out = out.__finalize__(other)\n return out\n\n def __divmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = self // other\n mod = self - div * other\n return div, mod\n\n def __rdivmod__(self, other) -> tuple[DataFrame, DataFrame]:\n # Naive implementation, room for optimization\n div = other // self\n mod = other - div * self\n return div, mod\n\n def _flex_cmp_method(\n self, other, op, *, axis: Axis = \"columns\", level=None\n ) -> DataFrame:\n axis = self._get_axis_number(axis) if axis is not None else 1\n\n self, other = self._align_for_op(other, axis, flex=True, level=level)\n\n new_data = self._dispatch_frame_op(other, op, axis=axis)\n return self._construct_result(new_data, other=other)\n\n def eq(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Equal to of dataframe and other, element-wise (binary operator `eq`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.eq, axis=axis, level=level)\n\n def ne(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Not equal to of dataframe and other, element-wise (binary operator `ne`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df == 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.eq(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df != pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True True\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.ne(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B True True\n C True True\n D True True\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df == [250, 100]\n cost revenue\n A True True\n B False False\n C False False\n\n Use the method to control the axis:\n\n >>> df.eq([250, 250, 100], axis=\"index\")\n cost revenue\n A True False\n B False True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.ne, axis=axis, level=level)\n\n def le(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than or equal to of dataframe and other, \\\n element-wise (binary operator `le`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df <= 100\n cost revenue\n A False True\n B False False\n C True False\n\n >>> df.le(100)\n cost revenue\n A False True\n B False False\n C True False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df <= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False True\n C True False\n\n Use the method to control the broadcast axis:\n\n >>> df.le(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df <= [250, 100]\n cost revenue\n A True True\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.le([250, 250, 100], axis='index')\n cost revenue\n A True True\n B True True\n C True False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.le(other)\n cost revenue\n A False True\n B False True\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.le(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.le, axis=axis, level=level)\n\n def lt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Less than of dataframe and other, element-wise (binary operator `lt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``<`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df < 100\n cost revenue\n A False False\n B False False\n C False False\n\n >>> df.lt(100)\n cost revenue\n A False False\n B False False\n C False False\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df < pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A False True\n B False False\n C False False\n\n Use the method to control the broadcast axis:\n\n >>> df.lt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A False False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df < [250, 100]\n cost revenue\n A False False\n B True False\n C True False\n\n Use the method to control the axis:\n\n >>> df.lt([250, 250, 100], axis=\"index\")\n cost revenue\n A False True\n B True False\n C False False\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.lt(other)\n cost revenue\n A False True\n B False False\n C False False\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.lt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A False True\n B True False\n C True False\n \"\"\"\n return self._flex_cmp_method(other, operator.lt, axis=axis, level=level)\n\n def ge(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than or equal to of dataframe and other, \\\n element-wise (binary operator `ge`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>=`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame({'cost': [250, 150, 100],\n ... 'revenue': [100, 250, 300]},\n ... index=['A', 'B', 'C'])\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df >= 100\n cost revenue\n A True True\n B True True\n C True True\n\n >>> df.ge(100)\n cost revenue\n A True True\n B True True\n C True True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df >= pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True True\n C True True\n\n Use the method to control the broadcast axis:\n\n >>> df.ge(pd.Series([100, 300], index=[\"A\", \"D\"]), axis='index')\n cost revenue\n A True True\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df >= [250, 100]\n cost revenue\n A True True\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.ge([250, 250, 100], axis='index')\n cost revenue\n A True False\n B False True\n C True True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},\n ... index=['A', 'B', 'C', 'D'])\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.ge(other)\n cost revenue\n A False False\n B False True\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],\n ... 'revenue': [100, 250, 300, 200, 175, 225]},\n ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],\n ... ['A', 'B', 'C', 'A', 'B', 'C']])\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.ge(df_multindex, level=1)\n cost revenue\n Q1 A True True\n B True True\n C True True\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.ge, axis=axis, level=level)\n\n def gt(self, other, axis: Axis = \"columns\", level=None) -> DataFrame:\n \"\"\"\n Get Greater than of dataframe and other, element-wise (binary operator `gt`).\n\n Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison\n operators.\n\n Equivalent to ``>`` with support to choose axis\n (rows or columns) and level for comparison.\n\n Parameters\n ----------\n other : scalar, sequence, Series, or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}, default 'columns'\n Whether to compare by the index (0 or 'index') or columns\n (1 or 'columns').\n level : int or label\n Broadcast across a level, matching Index values on the passed\n MultiIndex level.\n\n Returns\n -------\n DataFrame of bool\n Result of the comparison.\n\n See Also\n --------\n DataFrame.eq : Compare DataFrames for equality elementwise.\n DataFrame.ne : Compare DataFrames for inequality elementwise.\n DataFrame.le : Compare DataFrames for less than inequality\n or equality elementwise.\n DataFrame.lt : Compare DataFrames for strictly less than\n inequality elementwise.\n DataFrame.ge : Compare DataFrames for greater than inequality\n or equality elementwise.\n DataFrame.gt : Compare DataFrames for strictly greater than\n inequality elementwise.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n `NaN` values are considered different (i.e. `NaN` != `NaN`).\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"cost\": [250, 150, 100], \"revenue\": [100, 250, 300]},\n ... index=[\"A\", \"B\", \"C\"],\n ... )\n >>> df\n cost revenue\n A 250 100\n B 150 250\n C 100 300\n\n Comparison with a scalar, using either the operator or method:\n\n >>> df > 100\n cost revenue\n A True False\n B True True\n C False True\n\n >>> df.gt(100)\n cost revenue\n A True False\n B True True\n C False True\n\n When `other` is a :class:`Series`, the columns of a DataFrame are aligned\n with the index of `other` and broadcast:\n\n >>> df > pd.Series([100, 250], index=[\"cost\", \"revenue\"])\n cost revenue\n A True False\n B True False\n C False True\n\n Use the method to control the broadcast axis:\n\n >>> df.gt(pd.Series([100, 300], index=[\"A\", \"D\"]), axis=\"index\")\n cost revenue\n A True False\n B False False\n C False False\n D False False\n\n When comparing to an arbitrary sequence, the number of columns must\n match the number elements in `other`:\n\n >>> df > [250, 100]\n cost revenue\n A False False\n B False True\n C False True\n\n Use the method to control the axis:\n\n >>> df.gt([250, 250, 100], axis=\"index\")\n cost revenue\n A False False\n B False False\n C False True\n\n Compare to a DataFrame of different shape.\n\n >>> other = pd.DataFrame(\n ... {\"revenue\": [300, 250, 100, 150]}, index=[\"A\", \"B\", \"C\", \"D\"]\n ... )\n >>> other\n revenue\n A 300\n B 250\n C 100\n D 150\n\n >>> df.gt(other)\n cost revenue\n A False False\n B False False\n C False True\n D False False\n\n Compare to a MultiIndex by level.\n\n >>> df_multindex = pd.DataFrame(\n ... {\n ... \"cost\": [250, 150, 100, 150, 300, 220],\n ... \"revenue\": [100, 250, 300, 200, 175, 225],\n ... },\n ... index=[\n ... [\"Q1\", \"Q1\", \"Q1\", \"Q2\", \"Q2\", \"Q2\"],\n ... [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... ],\n ... )\n >>> df_multindex\n cost revenue\n Q1 A 250 100\n B 150 250\n C 100 300\n Q2 A 150 200\n B 300 175\n C 220 225\n\n >>> df.gt(df_multindex, level=1)\n cost revenue\n Q1 A False False\n B False False\n C False False\n Q2 A True False\n B False True\n C False True\n \"\"\"\n return self._flex_cmp_method(other, operator.gt, axis=axis, level=level)\n\n def add(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `add`).\n\n Equivalent to ``dataframe + other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `radd`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> df + 1\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.add(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> df + [1, 2]\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.add(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.add({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.add({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, operator.add, level=level, fill_value=fill_value, axis=axis\n )\n\n def radd(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Addition of dataframe and other, element-wise (binary operator `radd`).\n\n Equivalent to ``other + dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `add`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Add a scalar with operator version which return the same\n results.\n\n >>> 1 + df\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n >>> df.radd(1)\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a list and Series by axis with operator version.\n\n >>> [1, 2] + df\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd([1, 2], axis=\"columns\")\n angles degrees\n circle 1 362\n triangle 4 182\n rectangle 5 362\n\n >>> df.radd(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 361\n triangle 4 181\n rectangle 5 361\n\n Add a dictionary by axis.\n\n >>> df.radd({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 362\n triangle 3 182\n rectangle 4 362\n\n >>> df.radd({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 5 182\n rectangle 7 363\n \"\"\"\n return self._flex_arith_method(\n other, roperator.radd, level=level, fill_value=fill_value, axis=axis\n )\n\n def sub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `sub`).\n\n Equivalent to ``dataframe - other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `rsub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract a scalar with operator version which return the same\n results.\n\n >>> df - 1\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n >>> df.sub(1)\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a list and Series by axis with operator version.\n\n >>> df - [1, 2]\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub([1, 2], axis=\"columns\")\n angles degrees\n circle -1 358\n triangle 2 178\n rectangle 3 358\n\n >>> df.sub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle -1 359\n triangle 2 179\n rectangle 3 359\n\n Subtract a dictionary by axis.\n\n >>> df.sub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 358\n triangle 3 178\n rectangle 4 358\n\n >>> df.sub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 360\n triangle 1 178\n rectangle 1 357\n \"\"\"\n return self._flex_arith_method(\n other, operator.sub, level=level, fill_value=fill_value, axis=axis\n )\n\n subtract = sub\n\n def rsub(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).\n\n Equivalent to ``other - dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version, `sub`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Subtract by a scalar with operator version which return the same\n results.\n\n >>> 1 - df\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n >>> df.rsub(1)\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a list and Series by axis with operator version.\n\n >>> [1, 2] - df\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub([1, 2], axis=\"columns\")\n angles degrees\n circle 1 -358\n triangle -2 -178\n rectangle -3 -358\n\n >>> df.rsub(\n ... pd.Series([1, 1, 1], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 1 -359\n triangle -2 -179\n rectangle -3 -359\n\n Subtract by a dictionary by axis.\n\n >>> df.rsub({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 -358\n triangle -3 -178\n rectangle -4 -358\n\n >>> df.rsub({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 -360\n triangle -1 -178\n rectangle -1 -357\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rsub, level=level, fill_value=fill_value, axis=axis\n )\n\n def mul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `mul`).\n\n Equivalent to ``dataframe * other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply a scalar with operator version which return the same\n results.\n\n >>> df * 2\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.mul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply a list and Series by axis with operator version.\n\n >>> df * [1, 2]\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul([1, 2], axis=\"columns\")\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.mul(\n ... pd.Series([1, 2, 3], index=[\"circle\", \"triangle\", \"rectangle\"]),\n ... axis=\"index\",\n ... )\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n\n Multiply a dictionary by axis.\n\n >>> df.mul({\"angles\": 0, \"degrees\": 2})\n angles degrees\n circle 0 720\n triangle 0 360\n rectangle 0 720\n\n >>> df.mul({\"circle\": 0, \"triangle\": 2, \"rectangle\": 3}, axis=\"index\")\n angles degrees\n circle 0 0\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, operator.mul, level=level, fill_value=fill_value, axis=axis\n )\n\n multiply = mul\n\n def rmul(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Multiplication of dataframe and other, \\\n element-wise (binary operator `rmul`).\n\n Equivalent to ``other * dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mul`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Multiply by a scalar.\n\n >>> 2 * df\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n >>> df.rmul(2)\n angles degrees\n circle 0 720\n triangle 6 360\n rectangle 8 720\n\n Multiply by a list and Series.\n\n >>> [1, 2] * df\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul([1, 2], axis='columns')\n angles degrees\n circle 0 720\n triangle 3 360\n rectangle 4 720\n\n >>> df.rmul(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0 360\n triangle 6 360\n rectangle 12 1080\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmul, level=level, fill_value=fill_value, axis=axis\n )\n\n def truediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `truediv`).\n\n Equivalent to ``dataframe / other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rtruediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df / 2\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n >>> df.truediv(2)\n angles degrees\n circle 0.0 180.0\n triangle 1.5 90.0\n rectangle 2.0 180.0\n\n Divide by a list and Series.\n\n >>> df / [1, 2]\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv([1, 2], axis='columns')\n angles degrees\n circle 0.0 180.0\n triangle 3.0 90.0\n rectangle 4.0 180.0\n\n >>> df.truediv(pd.Series([1, 2, 3], index=['circle', 'triangle', 'rectangle']),\n ... axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n\n Divide by a dictionary by axis.\n\n >>> df.truediv({'angles': 2, 'degrees': 3})\n angles degrees\n circle 0.0 120.0\n triangle 1.5 60.0\n rectangle 2.0 120.0\n\n >>> df.truediv({'circle': 1, 'triangle': 2, 'rectangle': 3}, axis='index')\n angles degrees\n circle 0.000000 360.0\n triangle 1.500000 90.0\n rectangle 1.333333 120.0\n \"\"\"\n return self._flex_arith_method(\n other, operator.truediv, level=level, fill_value=fill_value, axis=axis\n )\n\n div = truediv\n divide = truediv\n\n def rtruediv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Floating division of dataframe and other, \\\n element-wise (binary operator `rtruediv`).\n\n Equivalent to ``other / dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `truediv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 1 / df\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n >>> df.rtruediv(1)\n angles degrees\n circle inf 0.002778\n triangle 0.333333 0.005556\n rectangle 0.250000 0.002778\n\n Divide a list.\n\n >>> [1, 2] / df\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n\n >>> df.rtruediv([1, 2], axis='columns')\n angles degrees\n circle inf 0.005556\n triangle 0.333333 0.011111\n rectangle 0.250000 0.005556\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rtruediv, level=level, fill_value=fill_value, axis=axis\n )\n\n rdiv = rtruediv\n\n def floordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `floordiv`).\n\n Equivalent to ``dataframe // other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rfloordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide by a scalar.\n\n >>> df // 2\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n >>> df.floordiv(2)\n angles degrees\n circle 0 180\n triangle 1 90\n rectangle 2 180\n\n Divide by a list and Series.\n\n >>> df // [1, 2]\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n\n >>> df.floordiv([1, 2], axis='columns')\n angles degrees\n circle 0 180\n triangle 3 90\n rectangle 4 180\n \"\"\"\n return self._flex_arith_method(\n other, operator.floordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def rfloordiv(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Integer division of dataframe and other, \\\n element-wise (binary operator `rfloordiv`).\n\n Equivalent to ``other // dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `floordiv`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Divide a scalar.\n\n >>> 10 // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv(10)\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n Divide a list.\n\n >>> [10, 20] // df\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n\n >>> df.rfloordiv([10, 20], axis='columns')\n angles degrees\n circle inf 0.0\n triangle 3.0 0.0\n rectangle 2.0 0.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rfloordiv, level=level, fill_value=fill_value, axis=axis\n )\n\n def mod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, element-wise (binary operator `mod`).\n\n Equivalent to ``dataframe % other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rmod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\"angles\": [0, 3, 4], \"degrees\": [360, 180, 360]},\n ... index=[\"circle\", \"triangle\", \"rectangle\"],\n ... )\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> df % 2\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod(2)\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n Calculate modulo with a list.\n\n >>> df % [2, 3]\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n\n >>> df.mod([2, 3], axis=\"columns\")\n angles degrees\n circle 0 0\n triangle 1 0\n rectangle 0 0\n \"\"\"\n return self._flex_arith_method(\n other, operator.mod, level=level, fill_value=fill_value, axis=axis\n )\n\n def rmod(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Modulo of dataframe and other, \\\n element-wise (binary operator `rmod`).\n\n Equivalent to ``other % dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `mod`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate modulo with a scalar.\n\n >>> 1000 % df\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n >>> df.rmod(1000)\n angles degrees\n circle NaN 280.0\n triangle 1.0 100.0\n rectangle 0.0 280.0\n\n Calculate modulo with a list.\n\n >>> [1000, 2000] % df\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n\n >>> df.rmod([1000, 2000], axis='columns')\n angles degrees\n circle NaN 200.0\n triangle 1.0 20.0\n rectangle 0.0 200.0\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rmod, level=level, fill_value=fill_value, axis=axis\n )\n\n def pow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `pow`).\n\n Equivalent to ``dataframe ** other``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `rpow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [360, 180, 360]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 360\n triangle 3 180\n rectangle 4 360\n\n Calculate exponential power with a scalar.\n\n >>> df ** 2\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n >>> df.pow(2)\n angles degrees\n circle 0 129600\n triangle 9 32400\n rectangle 16 129600\n\n Calculate exponential power with a list.\n\n >>> df ** [1, 2]\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n\n >>> df.pow([1, 2], axis='columns')\n angles degrees\n circle 0 129600\n triangle 3 32400\n rectangle 4 129600\n \"\"\"\n return self._flex_arith_method(\n other, operator.pow, level=level, fill_value=fill_value, axis=axis\n )\n\n def rpow(\n self, other, axis: Axis = \"columns\", level=None, fill_value=None\n ) -> DataFrame:\n \"\"\"\n Get Exponential power of dataframe and other, \\\n element-wise (binary operator `rpow`).\n\n Equivalent to ``other ** dataframe``, but with support to substitute a\n fill_value for missing data in one of the inputs. With reverse version,\n `pow`.\n\n Among flexible wrappers (`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`)\n to arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.\n\n Parameters\n ----------\n other : scalar, sequence, Series, dict or DataFrame\n Any single or multiple element data structure, or list-like object.\n axis : {0 or 'index', 1 or 'columns'}\n Whether to compare by the index (0 or 'index') or columns.\n (1 or 'columns'). For Series input, axis to match Series index on.\n level : int or label\n Broadcast across a level, matching Index values on the\n passed MultiIndex level.\n fill_value : float or None, default None\n Fill existing missing (NaN) values, and any new element needed for\n successful DataFrame alignment, with this value before computation.\n If data in both corresponding DataFrame locations is missing\n the result will be missing.\n\n Returns\n -------\n DataFrame\n Result of the arithmetic operation.\n\n See Also\n --------\n DataFrame.add : Add DataFrames.\n DataFrame.sub : Subtract DataFrames.\n DataFrame.mul : Multiply DataFrames.\n DataFrame.div : Divide DataFrames (float division).\n DataFrame.truediv : Divide DataFrames (float division).\n DataFrame.floordiv : Divide DataFrames (integer division).\n DataFrame.mod : Calculate modulo (remainder after division).\n DataFrame.pow : Calculate exponential power.\n\n Notes\n -----\n Mismatched indices will be unioned together.\n\n Examples\n --------\n >>> df = pd.DataFrame({'angles': [0, 3, 4],\n ... 'degrees': [3, 1, 3]},\n ... index=['circle', 'triangle', 'rectangle'])\n >>> df\n angles degrees\n circle 0 3\n triangle 3 1\n rectangle 4 3\n\n Calculate exponential power with a scalar.\n\n >>> 2 ** df\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n >>> df.rpow(2)\n angles degrees\n circle 1 8\n triangle 8 2\n rectangle 16 8\n\n Calculate exponential power with a list.\n\n >>> [2, 3] ** df\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n\n >>> df.rpow([2, 3], axis='columns')\n angles degrees\n circle 1 27\n triangle 8 3\n rectangle 16 27\n \"\"\"\n return self._flex_arith_method(\n other, roperator.rpow, level=level, fill_value=fill_value, axis=axis\n )\n\n # ----------------------------------------------------------------------\n # Combination-Related\n\n def compare(\n self,\n other: DataFrame,\n align_axis: Axis = 1,\n keep_shape: bool = False,\n keep_equal: bool = False,\n result_names: Suffixes = (\"self\", \"other\"),\n ) -> DataFrame:\n \"\"\"\n Compare to another DataFrame and show the differences.\n\n This method compares two DataFrames element-wise and returns a DataFrame\n highlighting the differences.\n\n Parameters\n ----------\n other : DataFrame\n Object to compare with.\n\n align_axis : {0 or 'index', 1 or 'columns'}, default 1\n Determine which axis to align the comparison on.\n\n * 0, or 'index' : Resulting differences are stacked vertically\n with rows drawn alternately from self and other.\n * 1, or 'columns' : Resulting differences are aligned horizontally\n with columns drawn alternately from self and other.\n\n keep_shape : bool, default False\n If true, all rows and columns are kept.\n Otherwise, only the ones with different values are kept.\n\n keep_equal : bool, default False\n If true, the result keeps values that are equal.\n Otherwise, equal values are shown as NaNs.\n\n result_names : tuple, default ('self', 'other')\n Set the dataframes names in the comparison.\n\n Returns\n -------\n DataFrame\n DataFrame that shows the differences stacked side by side.\n\n If align_axis is 0 or 'index', the resulting row index will be a\n MultiIndex with 'self' and 'other' stacked alternately at the\n inner level.\n\n If align_axis is 1 or 'columns' (the default), the resulting\n columns will be a MultiIndex with 'self' and 'other' stacked\n alternately at the inner level.\n\n Raises\n ------\n ValueError\n When the two DataFrames don't have identical labels or shape.\n\n See Also\n --------\n Series.compare : Compare with another Series and show differences.\n DataFrame.equals : Test whether two objects contain the same elements.\n\n Notes\n -----\n Matching NaNs will not appear as a difference.\n\n Can only compare identically-labeled\n (i.e. same shape, identical row and column labels) DataFrames\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"col1\": [\"a\", \"a\", \"b\", \"b\", \"a\"],\n ... \"col2\": [1.0, 2.0, 3.0, np.nan, 5.0],\n ... \"col3\": [1.0, 2.0, 3.0, 4.0, 5.0],\n ... },\n ... columns=[\"col1\", \"col2\", \"col3\"],\n ... )\n >>> df\n col1 col2 col3\n 0 a 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 3.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n >>> df2 = df.copy()\n >>> df2.loc[0, \"col1\"] = \"c\"\n >>> df2.loc[2, \"col3\"] = 4.0\n >>> df2\n col1 col2 col3\n 0 c 1.0 1.0\n 1 a 2.0 2.0\n 2 b 3.0 4.0\n 3 b NaN 4.0\n 4 a 5.0 5.0\n\n Align the differences on columns\n\n >>> df.compare(df2)\n col1 col3\n self other self other\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Assign result_names\n\n >>> df.compare(df2, result_names=(\"left\", \"right\"))\n col1 col3\n left right left right\n 0 a c NaN NaN\n 2 NaN NaN 3.0 4.0\n\n Stack the differences on rows\n\n >>> df.compare(df2, align_axis=0)\n col1 col3\n 0 self a NaN\n other c NaN\n 2 self NaN 3.0\n other NaN 4.0\n\n Keep the equal values\n\n >>> df.compare(df2, keep_equal=True)\n col1 col3\n self other self other\n 0 a c 1.0 1.0\n 2 b b 3.0 4.0\n\n Keep all original rows and columns\n\n >>> df.compare(df2, keep_shape=True)\n col1 col2 col3\n self other self other self other\n 0 a c NaN NaN NaN NaN\n 1 NaN NaN NaN NaN NaN NaN\n 2 NaN NaN NaN NaN 3.0 4.0\n 3 NaN NaN NaN NaN NaN NaN\n 4 NaN NaN NaN NaN NaN NaN\n\n Keep all original rows and columns and also all original values\n\n >>> df.compare(df2, keep_shape=True, keep_equal=True)\n col1 col2 col3\n self other self other self other\n 0 a c 1.0 1.0 1.0 1.0\n 1 a a 2.0 2.0 2.0 2.0\n 2 b b 3.0 3.0 3.0 4.0\n 3 b b NaN NaN 4.0 4.0\n 4 a a 5.0 5.0 5.0 5.0\n \"\"\"\n return super().compare(\n other=other,\n align_axis=align_axis,\n keep_shape=keep_shape,\n keep_equal=keep_equal,\n result_names=result_names,\n )\n\n def combine(\n self,\n other: DataFrame,\n func: Callable[[Series, Series], Series | Hashable],\n fill_value=None,\n overwrite: bool = True,\n ) -> DataFrame:\n \"\"\"\n Perform column-wise combine with another DataFrame.\n\n Combines a DataFrame with `other` DataFrame using `func`\n to element-wise combine columns. The row and column indexes of the\n resulting DataFrame will be the union of the two.\n\n Parameters\n ----------\n other : DataFrame\n The DataFrame to merge column-wise.\n func : function\n Function that takes two series as inputs and return a Series or a\n scalar. Used to merge the two dataframes column by columns.\n fill_value : scalar value, default None\n The value to fill NaNs with prior to passing any column to the\n merge func.\n overwrite : bool, default True\n If True, columns in `self` that do not exist in `other` will be\n overwritten with NaNs.\n\n Returns\n -------\n DataFrame\n Combination of the provided DataFrames.\n\n See Also\n --------\n DataFrame.combine_first : Combine two DataFrame objects and default to\n non-null values in frame calling the method.\n\n Examples\n --------\n Combine using a simple function that chooses the smaller column.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2\n >>> df1.combine(df2, take_smaller)\n A B\n 0 0 3\n 1 0 3\n\n Example using a true element-wise combine function.\n\n >>> df1 = pd.DataFrame({\"A\": [5, 0], \"B\": [2, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, np.minimum)\n A B\n 0 1 2\n 1 0 3\n\n Using `fill_value` fills Nones prior to passing the column to the\n merge function.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine(df2, take_smaller, fill_value=-5)\n A B\n 0 0 -5.0\n 1 0 4.0\n\n Example that demonstrates the use of `overwrite` and behavior when\n the axis differ between the dataframes.\n\n >>> df1 = pd.DataFrame({\"A\": [0, 0], \"B\": [4, 4]})\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [-10, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df1.combine(df2, take_smaller)\n A B C\n 0 NaN NaN NaN\n 1 NaN 3.0 -10.0\n 2 NaN 3.0 1.0\n\n >>> df1.combine(df2, take_smaller, overwrite=False)\n A B C\n 0 0.0 NaN NaN\n 1 0.0 3.0 -10.0\n 2 NaN 3.0 1.0\n\n Demonstrating the preference of the passed in dataframe.\n\n >>> df2 = pd.DataFrame(\n ... {\n ... \"B\": [3, 3],\n ... \"C\": [1, 1],\n ... },\n ... index=[1, 2],\n ... )\n >>> df2.combine(df1, take_smaller)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 NaN 0.0\n 2 3.0 NaN NaN\n\n >>> df2.combine(df1, take_smaller, overwrite=False)\n B C A\n 0 NaN NaN 0.0\n 1 3.0 1.0 0.0\n 2 3.0 1.0 NaN\n \"\"\"\n other_idxlen = len(other.index) # save for compare\n other_columns = other.columns\n\n this, other = self.align(other)\n new_index = this.index\n\n if other.empty and len(new_index) == len(self.index):\n return self.copy()\n\n if self.empty and len(other) == other_idxlen:\n return other.copy()\n\n # preserve column order\n new_columns = self.columns.union(other_columns, sort=False)\n this = this.reindex(new_columns, axis=1)\n other = other.reindex(new_columns, axis=1)\n\n do_fill = fill_value is not None\n result = {}\n for i in range(this.shape[1]):\n series = this.iloc[:, i]\n other_series = other.iloc[:, i]\n\n this_dtype = series.dtype\n other_dtype = other_series.dtype\n\n this_mask = isna(series)\n other_mask = isna(other_series)\n\n # don't overwrite columns unnecessarily\n # DO propagate if this column is not in the intersection\n if not overwrite and other_mask.all():\n result[i] = series.copy()\n continue\n\n if do_fill:\n series = series.copy()\n other_series = other_series.copy()\n series[this_mask] = fill_value\n other_series[other_mask] = fill_value\n\n if new_columns[i] not in self.columns:\n # If self DataFrame does not have col in other DataFrame,\n # try to promote series, which is all NaN, as other_dtype.\n new_dtype = other_dtype\n try:\n series = series.astype(new_dtype)\n except ValueError:\n # e.g. new_dtype is integer types\n pass\n else:\n # if we have different dtypes, possibly promote\n new_dtype = find_common_type([this_dtype, other_dtype])\n series = series.astype(new_dtype)\n other_series = other_series.astype(new_dtype)\n\n arr = func(series, other_series)\n if isinstance(new_dtype, np.dtype):\n # if new_dtype is an EA Dtype, then `func` is expected to return\n # the correct dtype without any additional casting\n # error: No overload variant of \"maybe_downcast_to_dtype\" matches\n # argument types \"Union[Series, Hashable]\", \"dtype[Any]\"\n arr = maybe_downcast_to_dtype( # type: ignore[call-overload]\n arr, new_dtype\n )\n\n result[i] = arr\n\n frame_result = self._constructor(result, index=new_index)\n frame_result.columns = new_columns\n return frame_result.__finalize__(self, method=\"combine\")\n\n def combine_first(self, other: DataFrame) -> DataFrame:\n \"\"\"\n Update null elements with value in the same location in `other`.\n\n Combine two DataFrame objects by filling null values in one DataFrame\n with non-null values from other DataFrame. The row and column indexes\n of the resulting DataFrame will be the union of the two. The resulting\n dataframe contains the 'first' dataframe values and overrides the\n second one values where both first.loc[index, col] and\n second.loc[index, col] are not missing values, upon calling\n first.combine_first(second).\n\n Parameters\n ----------\n other : DataFrame\n Provided DataFrame to use to fill null values.\n\n Returns\n -------\n DataFrame\n The result of combining the provided DataFrame with the other object.\n\n See Also\n --------\n DataFrame.combine : Perform series-wise operation on two DataFrames\n using a given function.\n\n Examples\n --------\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [None, 4]})\n >>> df2 = pd.DataFrame({\"A\": [1, 1], \"B\": [3, 3]})\n >>> df1.combine_first(df2)\n A B\n 0 1.0 3.0\n 1 0.0 4.0\n\n Null values still persist if the location of that null value\n does not exist in `other`\n\n >>> df1 = pd.DataFrame({\"A\": [None, 0], \"B\": [4, None]})\n >>> df2 = pd.DataFrame({\"B\": [3, 3], \"C\": [1, 1]}, index=[1, 2])\n >>> df1.combine_first(df2)\n A B C\n 0 NaN 4.0 NaN\n 1 0.0 3.0 1.0\n 2 NaN 3.0 1.0\n \"\"\"\n\n def combiner(x: Series, y: Series):\n # GH#60128 The combiner is supposed to preserve EA Dtypes.\n return y if y.name not in self.columns else y.where(x.isna(), x)\n\n if len(other) == 0:\n combined = self.reindex(\n self.columns.append(other.columns.difference(self.columns)), axis=1\n )\n combined = combined.astype(other.dtypes)\n else:\n combined = self.combine(other, combiner, overwrite=False)\n\n dtypes = {\n # Check for isinstance(..., (np.dtype, ExtensionDtype))\n # to prevent raising on non-unique columns see GH#29135.\n # Note we will just not-cast in these cases.\n col: find_common_type([self.dtypes[col], other.dtypes[col]])\n for col in self.columns.intersection(other.columns)\n if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))\n and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))\n and combined.dtypes[col] != self.dtypes[col]\n }\n\n if dtypes:\n combined = combined.astype(dtypes)\n\n return combined.__finalize__(self, method=\"combine_first\")\n\n def update(\n self,\n other,\n join: UpdateJoin = \"left\",\n overwrite: bool = True,\n filter_func=None,\n errors: IgnoreRaise = \"ignore\",\n ) -> None:\n \"\"\"\n Modify in place using non-NA values from another DataFrame.\n\n Aligns on indices. There is no return value.\n\n Parameters\n ----------\n other : DataFrame, or object coercible into a DataFrame\n Should have at least one matching index/column label\n with the original DataFrame. If a Series is passed,\n its name attribute must be set, and that will be\n used as the column name to align with the original DataFrame.\n join : {'left'}, default 'left'\n Only left join is implemented, keeping the index and columns of the\n original object.\n overwrite : bool, default True\n How to handle non-NA values for overlapping keys:\n\n * True: overwrite original DataFrame's values\n with values from `other`.\n * False: only update values that are NA in\n the original DataFrame.\n\n filter_func : callable(1d-array) -> bool 1d-array, optional\n Can choose to replace values other than NA. Return True for values\n that should be updated.\n errors : {'raise', 'ignore'}, default 'ignore'\n If 'raise', will raise a ValueError if the DataFrame and `other`\n both contain non-NA data in the same place.\n\n Returns\n -------\n None\n This method directly changes calling object.\n\n Raises\n ------\n ValueError\n * When `errors='raise'` and there's overlapping non-NA data.\n * When `errors` is not either `'ignore'` or `'raise'`\n NotImplementedError\n * If `join != 'left'`\n\n See Also\n --------\n dict.update : Similar method for dictionaries.\n DataFrame.merge : For column(s)-on-column(s) operations.\n\n Notes\n -----\n 1. Duplicate indices on `other` are not supported and raises `ValueError`.\n\n Examples\n --------\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400, 500, 600]})\n >>> new_df = pd.DataFrame({\"B\": [4, 5, 6], \"C\": [7, 8, 9]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4\n 1 2 5\n 2 3 6\n\n The DataFrame's length does not increase as a result of the update,\n only values at matching index/column labels are updated.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"e\", \"f\", \"g\", \"h\", \"i\"]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_df = pd.DataFrame({\"B\": [\"d\", \"f\"]}, index=[0, 2])\n >>> df.update(new_df)\n >>> df\n A B\n 0 a d\n 1 b y\n 2 c f\n\n For Series, its name attribute must be set.\n\n >>> df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\"], \"B\": [\"x\", \"y\", \"z\"]})\n >>> new_column = pd.Series([\"d\", \"e\", \"f\"], name=\"B\")\n >>> df.update(new_column)\n >>> df\n A B\n 0 a d\n 1 b e\n 2 c f\n\n If `other` contains NaNs the corresponding values are not updated\n in the original dataframe.\n\n >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [400.0, 500.0, 600.0]})\n >>> new_df = pd.DataFrame({\"B\": [4, np.nan, 6]})\n >>> df.update(new_df)\n >>> df\n A B\n 0 1 4.0\n 1 2 500.0\n 2 3 6.0\n \"\"\"\n if not CHAINED_WARNING_DISABLED:\n if sys.getrefcount(\n self\n ) <= REF_COUNT_METHOD and not com.is_local_in_caller_frame(self):\n warnings.warn(\n _chained_assignment_method_update_msg,\n ChainedAssignmentError,\n stacklevel=2,\n )\n\n # TODO: Support other joins\n if join != \"left\": # pragma: no cover\n raise NotImplementedError(\"Only left join is supported\")\n if errors not in [\"ignore\", \"raise\"]:\n raise ValueError(\"The parameter errors must be either 'ignore' or 'raise'\")\n\n if not isinstance(other, DataFrame):\n other = DataFrame(other)\n\n if other.index.has_duplicates:\n raise ValueError(\"Update not allowed with duplicate indexes on other.\")\n\n index_intersection = other.index.intersection(self.index)\n if index_intersection.empty:\n return\n other = other.reindex(index_intersection)\n this_data = self.loc[index_intersection]\n\n for col in self.columns.intersection(other.columns):\n this = this_data[col]\n that = other[col]\n\n if filter_func is not None:\n mask = ~filter_func(this) | isna(that)\n else:\n if errors == \"raise\":\n mask_this = notna(that)\n mask_that = notna(this)\n if any(mask_this & mask_that):\n raise ValueError(\"Data overlaps.\")\n\n if overwrite:\n mask = isna(that)\n else:\n mask = notna(this)\n\n # don't overwrite columns unnecessarily\n if mask.all():\n continue\n\n self.loc[index_intersection, col] = this.where(mask, that)\n\n # ----------------------------------------------------------------------\n # Data reshaping\n @deprecate_nonkeyword_arguments(\n Pandas4Warning, allowed_args=[\"self\", \"by\", \"level\"], name=\"groupby\"\n )\n def groupby(\n self,\n by=None,\n level: IndexLabel | None = None,\n as_index: bool = True,\n sort: bool = True,\n group_keys: bool = True,\n observed: bool = True,\n dropna: bool = True,\n ) -> DataFrameGroupBy:\n \"\"\"\n Group DataFrame using a mapper or by a Series of columns.\n\n A groupby operation involves some combination of splitting the\n object, applying a function, and combining the results. This can be\n used to group large amounts of data and compute operations on these\n groups.\n\n Parameters\n ----------\n by : mapping, function, label, pd.Grouper or list of such\n Used to determine the groups for the groupby.\n If ``by`` is a function, it's called on each value of the object's\n index. If a dict or Series is passed, the Series or dict VALUES\n will be used to determine the groups (the Series' values are first\n aligned; see ``.align()`` method). If a list or ndarray of length\n equal to the number of rows is passed (see the `groupby user guide\n `_),\n the values are used as-is to determine the groups. A label or list\n of labels may be passed to group by the columns in ``self``.\n Notice that a tuple is interpreted as a (single) key.\n level : int, level name, or sequence of such, default None\n If the axis is a MultiIndex (hierarchical), group by a particular\n level or levels. Do not specify both ``by`` and ``level``.\n as_index : bool, default True\n Return object with group labels as the\n index. Only relevant for DataFrame input. as_index=False is\n effectively \"SQL-style\" grouped output. This argument has no effect\n on filtrations (see the `filtrations in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n sort : bool, default True\n Sort group keys. Get better performance by turning this off.\n Note this does not influence the order of observations within each\n group. Groupby preserves the order of rows within each group. If False,\n the groups will appear in the same order as they did in the original\n DataFrame.\n This argument has no effect on filtrations (see the `filtrations\n in the user guide\n `_),\n such as ``head()``, ``tail()``, ``nth()`` and in transformations\n (see the `transformations in the user guide\n `_).\n\n .. versionchanged:: 2.0.0\n\n Specifying ``sort=False`` with an ordered categorical grouper will no\n longer sort the values.\n\n group_keys : bool, default True\n When calling apply and the ``by`` argument produces a like-indexed\n (i.e. :ref:`a transform `) result, add group keys to\n index to identify pieces. By default group keys are not included\n when the result's index (and column) labels match the inputs, and\n are included otherwise.\n\n .. versionchanged:: 2.0.0\n\n ``group_keys`` now defaults to ``True``.\n\n observed : bool, default True\n This only applies if any of the groupers are Categoricals.\n If True: only show observed values for categorical groupers.\n If False: show all values for categorical groupers.\n\n .. versionchanged:: 3.0.0\n\n The default value is now ``True``.\n\n dropna : bool, default True\n If True, and if group keys contain NA values, NA values together\n with row/column will be dropped.\n If False, NA values will also be treated as the key in groups.\n\n Returns\n -------\n pandas.api.typing.DataFrameGroupBy\n Returns a groupby object that contains information about the groups.\n\n See Also\n --------\n resample : Convenience method for frequency conversion and resampling\n of time series.\n\n Notes\n -----\n See the `user guide\n `__ for more\n detailed usage and examples, including splitting an object into groups,\n iterating through groups, selecting a group, aggregation, and more.\n\n The implementation of groupby is hash-based, meaning in particular that\n objects that compare as equal will be considered to be in the same group.\n An exception to this is that pandas has special handling of NA values:\n any NA values will be collapsed to a single group, regardless of how\n they compare. See the user guide linked above for more details.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df\n Animal Max Speed\n 0 Falcon 380.0\n 1 Falcon 370.0\n 2 Parrot 24.0\n 3 Parrot 26.0\n >>> df.groupby([\"Animal\"]).mean()\n Max Speed\n Animal\n Falcon 375.0\n Parrot 25.0\n\n **Hierarchical Indexes**\n\n We can groupby different levels of a hierarchical index\n using the `level` parameter:\n\n >>> arrays = [\n ... [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... [\"Captive\", \"Wild\", \"Captive\", \"Wild\"],\n ... ]\n >>> index = pd.MultiIndex.from_arrays(arrays, names=(\"Animal\", \"Type\"))\n >>> df = pd.DataFrame({\"Max Speed\": [390.0, 350.0, 30.0, 20.0]}, index=index)\n >>> df\n Max Speed\n Animal Type\n Falcon Captive 390.0\n Wild 350.0\n Parrot Captive 30.0\n Wild 20.0\n >>> df.groupby(level=0).mean()\n Max Speed\n Animal\n Falcon 370.0\n Parrot 25.0\n >>> df.groupby(level=\"Type\").mean()\n Max Speed\n Type\n Captive 210.0\n Wild 185.0\n\n We can also choose to include NA in group keys or not by setting\n `dropna` parameter, the default setting is `True`.\n\n >>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=[\"b\"]).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n\n >>> df.groupby(by=[\"b\"], dropna=False).sum()\n a c\n b\n 1.0 2 3\n 2.0 2 5\n NaN 1 4\n\n >>> arr = [[\"a\", 12, 12], [None, 12.3, 33.0], [\"b\", 12.3, 123], [\"a\", 1, 1]]\n >>> df = pd.DataFrame(arr, columns=[\"a\", \"b\", \"c\"])\n\n >>> df.groupby(by=\"a\").sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n\n >>> df.groupby(by=\"a\", dropna=False).sum()\n b c\n a\n a 13.0 13.0\n b 12.3 123.0\n NaN 12.3 33.0\n\n When using ``.apply()``, use ``group_keys`` to include or exclude the\n group keys. The ``group_keys`` argument defaults to ``True`` (include).\n\n >>> df = pd.DataFrame(\n ... {\n ... \"Animal\": [\"Falcon\", \"Falcon\", \"Parrot\", \"Parrot\"],\n ... \"Max Speed\": [380.0, 370.0, 24.0, 26.0],\n ... }\n ... )\n >>> df.groupby(\"Animal\", group_keys=True)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n Animal\n Falcon 0 380.0\n 1 370.0\n Parrot 2 24.0\n 3 26.0\n\n >>> df.groupby(\"Animal\", group_keys=False)[[\"Max Speed\"]].apply(lambda x: x)\n Max Speed\n 0 380.0\n 1 370.0\n 2 24.0\n 3 26.0\n \"\"\"\n from pandas.core.groupby.generic import DataFrameGroupBy\n\n if level is None and by is None:\n raise TypeError(\"You have to supply one of 'by' and 'level'\")\n\n return DataFrameGroupBy(\n obj=self,\n keys=by,\n level=level,\n as_index=as_index,\n sort=sort,\n group_keys=group_keys,\n observed=observed,\n dropna=dropna,\n )\n\n def pivot(\n self, *, columns, index=lib.no_default, values=lib.no_default\n ) -> DataFrame:\n \"\"\"\n Return reshaped DataFrame organized by given index / column values.\n\n Reshape data (produce a \"pivot\" table) based on column values. Uses\n unique values from specified `index` / `columns` to form axes of the\n resulting DataFrame. This function does not support data\n aggregation, multiple values will result in a MultiIndex in the\n columns. See the :ref:`User Guide ` for more on reshaping.\n\n Parameters\n ----------\n columns : Hashable or a sequence of the previous\n Column to use to make new frame's columns.\n index : Hashable or a sequence of the previous, optional\n Column to use to make new frame's index. If not given, uses existing index.\n values : Hashable or a sequence of the previous, optional\n Column(s) to use for populating new frame's values. If not\n specified, all remaining columns will be used and the result will\n have hierarchically indexed columns.\n\n Returns\n -------\n DataFrame\n Returns reshaped DataFrame.\n\n Raises\n ------\n ValueError:\n When there are any `index`, `columns` combinations with multiple\n values. `DataFrame.pivot_table` when you need to aggregate.\n\n See Also\n --------\n DataFrame.pivot_table : Generalization of pivot that can handle\n duplicate values for one index/column pair.\n DataFrame.unstack : Pivot based on the index values instead of a\n column.\n wide_to_long : Wide panel to long format. Less flexible but more\n user-friendly than melt.\n\n Notes\n -----\n For finer-tuned control, see hierarchical indexing documentation along\n with the related stack/unstack methods.\n\n Reference :ref:`the user guide ` for more examples.\n\n Examples\n --------\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"one\", \"two\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4, 5, 6],\n ... \"zoo\": [\"x\", \"y\", \"z\", \"q\", \"w\", \"t\"],\n ... }\n ... )\n >>> df\n foo bar baz zoo\n 0 one A 1 x\n 1 one B 2 y\n 2 one C 3 z\n 3 two A 4 q\n 4 two B 5 w\n 5 two C 6 t\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\")[\"baz\"]\n bar A B C\n foo\n one 1 2 3\n two 4 5 6\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=[\"baz\", \"zoo\"])\n baz zoo\n bar A B C A B C\n foo\n one 1 2 3 x y z\n two 4 5 6 q w t\n\n You could also assign a list of column names or a list of index names.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"lev1\": [1, 1, 1, 2, 2, 2],\n ... \"lev2\": [1, 1, 2, 1, 1, 2],\n ... \"lev3\": [1, 2, 1, 2, 1, 2],\n ... \"lev4\": [1, 2, 3, 4, 5, 6],\n ... \"values\": [0, 1, 2, 3, 4, 5],\n ... }\n ... )\n >>> df\n lev1 lev2 lev3 lev4 values\n 0 1 1 1 1 0\n 1 1 1 2 2 1\n 2 1 2 1 3 2\n 3 2 1 2 4 3\n 4 2 1 1 5 4\n 5 2 2 2 6 5\n\n >>> df.pivot(index=\"lev1\", columns=[\"lev2\", \"lev3\"], values=\"values\")\n lev2 1 2\n lev3 1 2 1 2\n lev1\n 1 0.0 1.0 2.0 NaN\n 2 4.0 3.0 NaN 5.0\n\n >>> df.pivot(index=[\"lev1\", \"lev2\"], columns=[\"lev3\"], values=\"values\")\n lev3 1 2\n lev1 lev2\n 1 1 0.0 1.0\n 2 2.0 NaN\n 2 1 4.0 3.0\n 2 NaN 5.0\n\n A ValueError is raised if there are any duplicates.\n\n >>> df = pd.DataFrame(\n ... {\n ... \"foo\": [\"one\", \"one\", \"two\", \"two\"],\n ... \"bar\": [\"A\", \"A\", \"B\", \"C\"],\n ... \"baz\": [1, 2, 3, 4],\n ... }\n ... )\n >>> df\n foo bar baz\n 0 one A 1\n 1 one A 2\n 2 two B 3\n 3 two C 4\n\n Notice that the first two rows are the same for our `index`\n and `columns` arguments.\n\n >>> df.pivot(index=\"foo\", columns=\"bar\", values=\"baz\")\n Traceback (most recent call last):\n ...\n ValueError: Index contains duplicate entries, cannot reshape\n \"\"\"\n from pandas.core.reshape.pivot import pivot\n\n return pivot(self, index=index, columns=columns, values=values)\n\n```\n\nReproduce the first 20 lines of the body of function `pivot`.", "tags": ["codegen", "python", "positional-recall", "long-context"]} +{"id":"needle-front-4k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-front-4k\nApproximate target context: 4000 tokens; needle position: front.\nFind the Python benchmark needle for needle-front-4k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\n# InferHarness context needle: needle-front-4k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_4K_FRONT\"\n# End InferHarness context needle\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n```\n","tags":["context-window","needle-retrieval","python","front","4k"],"expected_answer":"IH_NEEDLE_4K_FRONT","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":4000,"needle_position":"front","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-middle-4k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-middle-4k\nApproximate target context: 4000 tokens; needle position: middle.\nFind the Python benchmark needle for needle-middle-4k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n# InferHarness context needle: needle-middle-4k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_4K_MIDDLE\"\n# End InferHarness context needle\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n```\n","tags":["context-window","needle-retrieval","python","middle","4k"],"expected_answer":"IH_NEEDLE_4K_MIDDLE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":4000,"needle_position":"middle","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-late-4k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-late-4k\nApproximate target context: 4000 tokens; needle position: late_80_percent.\nFind the Python benchmark needle for needle-late-4k. Reply exactly with the string assigned to BENCHMARK_NEEDLE, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n# InferHarness context needle: needle-late-4k\nBENCHMARK_NEEDLE = \"IH_NEEDLE_4K_LATE\"\n# End InferHarness context needle\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n```\n","tags":["context-window","needle-retrieval","python","late_80_percent","4k"],"expected_answer":"IH_NEEDLE_4K_LATE","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":4000,"needle_position":"late_80_percent","needle_count":1,"evaluation_mode":"single_needle_exact_value"}} +{"id":"needle-two-facts-4k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: needle-two-facts-4k\nApproximate target context: 4000 tokens; needle position: two_facts_20_and_80_percent.\nFind both Python benchmark needles for needle-two-facts-4k. Reply with the string assigned to BENCHMARK_NEEDLE_ALPHA, then a | separator, then the string assigned to BENCHMARK_NEEDLE_BETA, with no quotes, no markdown, and no extra text.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\n# InferHarness context needle: needle-two-facts-4k alpha\nBENCHMARK_NEEDLE_ALPHA = \"IH_NEEDLE_4K_ALPHA\"\n# End InferHarness context needle alpha\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n# InferHarness context needle: needle-two-facts-4k beta\nBENCHMARK_NEEDLE_BETA = \"IH_NEEDLE_4K_BETA\"\n# End InferHarness context needle beta\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n```\n","tags":["context-window","needle-retrieval","python","two-facts","4k"],"expected_answer":"IH_NEEDLE_4K_ALPHA|IH_NEEDLE_4K_BETA","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":4000,"needle_position":"two_facts_20_and_80_percent","needle_count":2,"evaluation_mode":"two_fact_exact_values"}} +{"id":"negative-control-4k","system_prompt":"You are a strict context retrieval engine. Return only the requested value or NOT_FOUND.","prompt":"Context-window benchmark item: negative-control-4k\nApproximate target context: 4000 tokens; needle position: absent.\nThe source may or may not contain a Python benchmark needle for negative-control-4k. If the needle is absent, reply exactly: NOT_FOUND.\n\n\n```python\nDataFrame\n---------\nAn efficient 2D container for potentially mixed-type time series or other\nlabeled data series.\n\nSimilar to its R counterpart, data.frame, except providing automatic data\nalignment and a host of useful data manipulation methods having to do with the\nlabeling information\n\"\"\"\n\nfrom __future__ import annotations\n\nimport collections\nfrom collections import abc\nimport functools\nfrom io import StringIO\nimport itertools\nimport operator\nimport sys\nfrom typing import (\n TYPE_CHECKING,\n Any,\n Literal,\n Self,\n cast,\n overload,\n)\nimport warnings\n\nimport numpy as np\nfrom numpy import ma\n\nfrom pandas._config.config import _global_config as config\n\nfrom pandas._libs import (\n algos as libalgos,\n lib,\n properties,\n)\nfrom pandas._libs.hashtable import duplicated\nfrom pandas._libs.lib import is_range_indexer\nfrom pandas.compat import CHAINED_WARNING_DISABLED\nfrom pandas.compat._constants import (\n REF_COUNT,\n REF_COUNT_METHOD,\n)\nfrom pandas.compat._optional import import_optional_dependency\nfrom pandas.compat.numpy import function as nv\nfrom pandas.errors import (\n ChainedAssignmentError,\n InvalidIndexError,\n Pandas4Warning,\n)\nfrom pandas.errors.cow import (\n _chained_assignment_method_update_msg,\n _chained_assignment_msg,\n)\nfrom pandas.util._decorators import (\n deprecate_nonkeyword_arguments,\n set_module,\n)\nfrom pandas.util._exceptions import (\n find_stack_level,\n)\nfrom pandas.util._validators import (\n validate_ascending,\n validate_bool_kwarg,\n validate_percentile,\n)\n\nfrom pandas.core.dtypes.cast import (\n LossySetitemError,\n can_hold_element,\n construct_1d_arraylike_from_scalar,\n construct_2d_arraylike_from_scalar,\n find_common_type,\n infer_dtype_from_scalar,\n invalidate_string_dtypes,\n maybe_downcast_to_dtype,\n maybe_unbox_numpy_scalar,\n)\nfrom pandas.core.dtypes.common import (\n infer_dtype_from_object,\n is_1d_only_ea_dtype,\n is_array_like,\n is_bool_dtype,\n is_dataclass,\n is_dict_like,\n is_float,\n is_float_dtype,\n is_hashable,\n is_integer,\n is_integer_dtype,\n is_iterator,\n is_list_like,\n is_scalar,\n is_sequence,\n is_string_dtype,\n needs_i8_conversion,\n pandas_dtype,\n)\nfrom pandas.core.dtypes.concat import concat_compat\nfrom pandas.core.dtypes.dtypes import (\n ArrowDtype,\n BaseMaskedDtype,\n ExtensionDtype,\n)\nfrom pandas.core.dtypes.generic import (\n ABCIndex,\n ABCSeries,\n)\nfrom pandas.core.dtypes.missing import (\n isna,\n notna,\n)\n\nfrom pandas.core import (\n algorithms,\n common as com,\n nanops,\n ops,\n roperator,\n)\nfrom pandas.core.accessor import Accessor\nfrom pandas.core.apply import reconstruct_and_relabel_result\nfrom pandas.core.array_algos.take import take_2d_multi\nfrom pandas.core.arraylike import OpsMixin\nfrom pandas.core.arrays import (\n BaseMaskedArray,\n DatetimeArray,\n ExtensionArray,\n NumpyExtensionArray,\n PeriodArray,\n TimedeltaArray,\n)\nfrom pandas.core.arrays.sparse import SparseFrameAccessor\nfrom pandas.core.arrays.string_ import StringDtype\nfrom pandas.core.construction import (\n ensure_wrapped_if_datetimelike,\n sanitize_array,\n sanitize_masked_array,\n)\nfrom pandas.core.generic import NDFrame\nfrom pandas.core.indexers import check_key_length\nfrom pandas.core.indexes.api import (\n DatetimeIndex,\n Index,\n PeriodIndex,\n default_index,\n ensure_index,\n ensure_index_from_sequences,\n)\nfrom pandas.core.indexes.multi import (\n MultiIndex,\n maybe_droplevels,\n)\nfrom pandas.core.indexing import (\n check_bool_indexer,\n check_dict_or_set_indexers,\n infer_and_maybe_downcast,\n)\nfrom pandas.core.internals import BlockManager\nfrom pandas.core.internals.construction import (\n arrays_to_mgr,\n dataclasses_to_dicts,\n dict_to_mgr,\n ndarray_to_mgr,\n nested_data_to_arrays,\n rec_array_to_mgr,\n reorder_arrays,\n to_arrays,\n treat_as_nested,\n)\nfrom pandas.core.methods import selectn\nfrom pandas.core.reshape.melt import melt\nfrom pandas.core.series import Series\nfrom pandas.core.sorting import (\n get_group_index,\n lexsort_indexer,\n nargsort,\n)\n\nfrom pandas.io.common import get_handle\nfrom pandas.io.formats import (\n console,\n format as fmt,\n)\nfrom pandas.io.formats.info import DataFrameInfo\nimport pandas.plotting\n\nif TYPE_CHECKING:\n from collections.abc import (\n Callable,\n Hashable,\n Iterable,\n Iterator,\n Mapping,\n Sequence,\n )\n import datetime\n\n from pandas._libs.internals import BlockValuesRefs\n from pandas._typing import (\n AggFuncType,\n AnyAll,\n AnyArrayLike,\n ArrayLike,\n ArrowArrayExportable,\n ArrowStreamExportable,\n Axes,\n Axis,\n AxisInt,\n ColspaceArgType,\n CompressionOptions,\n CorrelationMethod,\n DropKeep,\n Dtype,\n DtypeObj,\n FilePath,\n FloatFormatType,\n FormattersType,\n Frequency,\n FromDictOrient,\n HashableT,\n HashableT2,\n IgnoreRaise,\n IndexKeyFunc,\n IndexLabel,\n JoinValidate,\n Level,\n ListLike,\n MergeHow,\n MergeValidate,\n MutableMappingT,\n NaPosition,\n NsmallestNlargestKeep,\n ParquetCompressionOptions,\n PythonFuncType,\n QuantileInterpolation,\n ReadBuffer,\n ReindexMethod,\n Renamer,\n Scalar,\n SequenceNotStr,\n SortKind,\n StorageOptions,\n Suffixes,\n T,\n ToStataByteorder,\n ToTimestampHow,\n UpdateJoin,\n ValueKeyFunc,\n WriteBuffer,\n XMLParsers,\n npt,\n )\n\n from pandas.core.groupby.generic import DataFrameGroupBy\n from pandas.core.interchange.dataframe_protocol import DataFrame as DataFrameXchg\n\n from pandas.io.formats.style import Styler\n\n\n# -----------------------------------------------------------------------\n# DataFrame class\n\n\n@set_module(\"pandas\")\nclass DataFrame(NDFrame, OpsMixin):\n \"\"\"\n Two-dimensional, size-mutable, potentially heterogeneous tabular data.\n\n Data structure also contains labeled axes (rows and columns).\n Arithmetic operations align on both row and column labels. Can be\n thought of as a dict-like container for Series objects. The primary\n pandas data structure.\n\n Parameters\n ----------\n data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame\n Dict can contain Series, arrays, constants, dataclass or list-like objects. If\n data is a dict, column order follows insertion-order. If a dict contains Series\n which have an index defined, it is aligned by its index. This alignment also\n occurs if data is a Series or a DataFrame itself. Alignment is done on\n Series/DataFrame inputs.\n\n If data is a list of dicts, column order follows insertion-order.\n\n index : Index or array-like\n Index to use for resulting frame. Will default to RangeIndex if\n no indexing information part of input data and no index provided.\n columns : Index or array-like\n Column labels to use for resulting frame when data does not have them,\n defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,\n will perform column selection instead.\n dtype : dtype, default None\n Data type to force. Only a single dtype is allowed. If None, infer.\n If ``data`` is DataFrame then is ignored.\n copy : bool or None, default None\n Copy data from inputs.\n For dict data, the default of None behaves like ``copy=True``. For DataFrame\n or 2d ndarray input, the default of None behaves like ``copy=False``.\n If data is a dict containing one or more Series (possibly of different dtypes),\n ``copy=False`` will ensure that these inputs are not copied.\n\n See Also\n --------\n DataFrame.from_records : Constructor from tuples, also record arrays.\n DataFrame.from_dict : From dicts of Series, arrays, or dicts.\n read_csv : Read a comma-separated values (csv) file into DataFrame.\n read_table : Read general delimited file into DataFrame.\n read_clipboard : Read text from clipboard into DataFrame.\n\n Notes\n -----\n Please reference the :ref:`User Guide ` for more information.\n\n Examples\n --------\n Constructing DataFrame from a dictionary.\n\n >>> d = {\"col1\": [1, 2], \"col2\": [3, 4]}\n >>> df = pd.DataFrame(data=d)\n >>> df\n col1 col2\n 0 1 3\n 1 2 4\n\n Notice that the inferred dtype is int64.\n\n >>> df.dtypes\n col1 int64\n col2 int64\n dtype: object\n\n To enforce a single dtype:\n\n >>> df = pd.DataFrame(data=d, dtype=np.int8)\n >>> df.dtypes\n col1 int8\n col2 int8\n dtype: object\n\n Constructing DataFrame from a dictionary including Series:\n\n >>> d = {\"col1\": [0, 1, 2, 3], \"col2\": pd.Series([2, 3], index=[2, 3])}\n >>> pd.DataFrame(data=d, index=[0, 1, 2, 3])\n col1 col2\n 0 0 NaN\n 1 1 NaN\n 2 2 2.0\n 3 3 3.0\n\n Constructing DataFrame from numpy ndarray:\n\n >>> df2 = pd.DataFrame(\n ... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=[\"a\", \"b\", \"c\"]\n ... )\n >>> df2\n a b c\n 0 1 2 3\n 1 4 5 6\n 2 7 8 9\n\n Constructing DataFrame from a numpy ndarray that has labeled columns:\n\n >>> data = np.array(\n ... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],\n ... dtype=[(\"a\", \"i4\"), (\"b\", \"i4\"), (\"c\", \"i4\")],\n ... )\n >>> df3 = pd.DataFrame(data, columns=[\"c\", \"a\"])\n >>> df3\n c a\n 0 3 1\n 1 6 4\n 2 9 7\n\n Constructing DataFrame from dataclass:\n\n >>> from dataclasses import make_dataclass\n >>> Point = make_dataclass(\"Point\", [(\"x\", int), (\"y\", int)])\n >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])\n x y\n 0 0 0\n 1 0 3\n 2 2 3\n\n Constructing DataFrame from Series/DataFrame:\n\n >>> ser = pd.Series([1, 2, 3], index=[\"a\", \"b\", \"c\"])\n >>> df = pd.DataFrame(data=ser, index=[\"a\", \"c\"])\n >>> df\n 0\n a 1\n c 3\n\n >>> df1 = pd.DataFrame([1, 2, 3], index=[\"a\", \"b\", \"c\"], columns=[\"x\"])\n >>> df2 = pd.DataFrame(data=df1, index=[\"a\", \"c\"])\n >>> df2\n x\n a 1\n c 3\n \"\"\"\n\n _internal_names_set = {\"columns\", \"index\"} | NDFrame._internal_names_set\n _typ = \"dataframe\"\n _HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)\n _accessors: set[str] = {\"sparse\"}\n _hidden_attrs: frozenset[str] = NDFrame._hidden_attrs | frozenset([])\n _mgr: BlockManager\n\n # similar to __array_priority__, positions DataFrame before Series, Index,\n # and ExtensionArray. Should NOT be overridden by subclasses.\n __pandas_priority__ = 4000\n\n @property\n def _constructor(self) -> type[DataFrame]:\n return DataFrame\n\n def _constructor_from_mgr(self, mgr, axes) -> DataFrame:\n df = DataFrame._from_mgr(mgr, axes=axes)\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor is DataFrame`, but\n # this check is slightly faster, benefiting the most-common case.\n return df\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.DataFrame object.\n return self._constructor(df)\n\n _constructor_sliced: Callable[..., Series] = Series\n\n def _constructor_sliced_from_mgr(self, mgr, axes) -> Series:\n ser = Series._from_mgr(mgr, axes)\n # Use object.__setattr__ to bypass NDFrame.__setattr__ overhead\n object.__setattr__(ser, \"_name\", None) # caller sets real name\n\n if type(self) is DataFrame:\n # This would also work `if self._constructor_sliced is Series`, but\n # this check is slightly faster, benefiting the most-common case.\n return ser\n\n # We assume that the subclass __init__ knows how to handle a\n # pd.Series object.\n return self._constructor_sliced(ser)\n\n # ----------------------------------------------------------------------\n # Constructors\n\n def __init__(\n self,\n data=None,\n index: Axes | None = None,\n columns: Axes | None = None,\n dtype: Dtype | None = None,\n copy: bool | None = None,\n ) -> None:\n allow_mgr = False\n if dtype is not None:\n dtype = self._validate_dtype(dtype)\n\n if isinstance(data, DataFrame):\n data = data._mgr\n allow_mgr = True\n if not copy:\n # if not copying data, ensure to still return a shallow copy\n # to avoid the result sharing the same Manager\n data = data.copy(deep=False)\n\n if isinstance(data, BlockManager):\n if not allow_mgr:\n # GH#52419\n warnings.warn(\n f\"Passing a {type(data).__name__} to {type(self).__name__} \"\n \"is deprecated and will raise in a future version. \"\n \"Use public APIs instead.\",\n Pandas4Warning,\n stacklevel=2,\n )\n\n data = data.copy(deep=False)\n # first check if a Manager is passed without any other arguments\n # -> use fastpath (without checking Manager type)\n if index is None and columns is None and dtype is None and not copy:\n # GH#33357 fastpath\n NDFrame.__init__(self, data)\n return\n\n # GH47215\n if isinstance(index, set):\n raise ValueError(\"index cannot be a set\")\n if isinstance(columns, set):\n raise ValueError(\"columns cannot be a set\")\n\n if copy is None:\n if isinstance(data, dict):\n # retain pre-GH#38939 default behavior\n copy = True\n elif not isinstance(data, (Index, DataFrame, Series)):\n copy = True\n else:\n copy = False\n\n if data is None:\n index = index if index is not None else default_index(0)\n columns = columns if columns is not None else default_index(0)\n dtype = dtype if dtype is not None else pandas_dtype(object)\n data = []\n\n if isinstance(data, BlockManager):\n mgr = self._init_mgr(\n data, axes={\"index\": index, \"columns\": columns}, dtype=dtype, copy=copy\n )\n\n elif isinstance(data, dict):\n # GH#38939 de facto copy defaults to False only in non-dict cases\n mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)\n elif isinstance(data, ma.MaskedArray):\n from numpy.ma import mrecords\n\n # masked recarray\n if isinstance(data, mrecords.MaskedRecords):\n raise TypeError(\n \"MaskedRecords are not supported. Pass \"\n \"{name: data[name] for name in data.dtype.names} \"\n \"instead\"\n )\n\n # a masked array\n data = sanitize_masked_array(data)\n mgr = ndarray_to_mgr(\n data,\n index,\n columns,\n dtype=dtype,\n copy=copy,\n )\n\n elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):\n if data.dtype.names:\n # i.e. numpy structured array\n data = cast(\"np.ndarray\", data)\n mgr = rec_array_to_mgr(\n data,\n index,\n columns,\n dtype,\n copy,\n )\n elif isinstance(data, (ABCSeries, ABCIndex)) and data.name is not None:\n # i.e. Series/Index with non-None name\n mgr = dict_to_mgr(\n```\n","tags":["context-window","needle-retrieval","python","negative-control","4k"],"expected_answer":"NOT_FOUND","expected_format":"free_text","metadata":{"source_file":"backend/data/datasets/frame.py","target_context_tokens":4000,"needle_position":"absent","needle_count":0,"evaluation_mode":"negative_control_not_found"}} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-128k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-128k-v1.json new file mode 100644 index 0000000..10349fd --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-128k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-function-retrieval-128k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-function-retrieval-128k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:79e2c6b53c00e6ca288911567ded0d814063a57164ca813697f8469caa3496cd", + "item_count": 5, + "item_hashes": [ + { + "item_id": "function-front-128k", + "hash": "sha256:89ee905cb4cc217124513eeb69a1ad9b6943ad25c2483a435630396f4dc01d77" + }, + { + "item_id": "function-middle-128k", + "hash": "sha256:e436ebd5b3d2ac6e99a9a14916f2f8bfcdf4200b8d4da3134450563fe4966f26" + }, + { + "item_id": "function-late-128k", + "hash": "sha256:a3c2f11070896c8160823a94162f32dfd73ef0c8a1a03fda10f71a8b4e5541cb" + }, + { + "item_id": "function-two-blocks-128k", + "hash": "sha256:c0d36aa2a4fece4db0cc7c85eb3ba57ec4ab770e3bcb593e012c13250853cf85" + }, + { + "item_id": "function-negative-control-128k", + "hash": "sha256:0a4a3ba3a168b9fbc378dcb2d53b7aab7174d1377df5a74599e253de1a08dd58" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-function-retrieval-128k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-function-retrieval-128k.jsonl", + "dataset_family": "function_retrieval", + "context_window_tokens": 128000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-16k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-16k-v1.json new file mode 100644 index 0000000..518d714 --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-16k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-function-retrieval-16k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-function-retrieval-16k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:09236e8d77ff47bac3e7169528c1796d64352b374d68ba5a68b06615e812f658", + "item_count": 5, + "item_hashes": [ + { + "item_id": "function-front-16k", + "hash": "sha256:ccba4364f62e5bb1b3978480ea1fc44f8b09fac5f072fb8751a92b9d293f686d" + }, + { + "item_id": "function-middle-16k", + "hash": "sha256:c0fe9f547fe73a296dc9aa9b374438cfbbd05fd77542bddcfbb0f545b404c524" + }, + { + "item_id": "function-late-16k", + "hash": "sha256:53ea3ec7ecb0c4a6590a11927148b8b412711c67be97ed0d910e4854fd5d5ed4" + }, + { + "item_id": "function-two-blocks-16k", + "hash": "sha256:d3d523a69337e9ab6737385c2aecb1f81f0b163de97b8d6ba1ad725ea097bdd0" + }, + { + "item_id": "function-negative-control-16k", + "hash": "sha256:740e4305fe70ec13cbf964ec1609b53002de9361e3c9ce74674a141b73e0f698" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-function-retrieval-16k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-function-retrieval-16k.jsonl", + "dataset_family": "function_retrieval", + "context_window_tokens": 16000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-256k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-256k-v1.json new file mode 100644 index 0000000..09f3c5e --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-256k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-function-retrieval-256k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-function-retrieval-256k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:02d2521f1fdee676be076b4340172a885638c4ea423083449822ff8d8a68b3fe", + "item_count": 5, + "item_hashes": [ + { + "item_id": "function-front-256k", + "hash": "sha256:0e85afd5e91a8bc215138347cfdfecc699988c8371b815d33c598d68c86e6e5a" + }, + { + "item_id": "function-middle-256k", + "hash": "sha256:cb47668518b5ec9e941573ea3ffe2e1f0b689210084c9b594212fd3b327fb05f" + }, + { + "item_id": "function-late-256k", + "hash": "sha256:3f44c8c34f7296c932eb0bd88049598c07a4317ff1e805fefd732671b00ac257" + }, + { + "item_id": "function-two-blocks-256k", + "hash": "sha256:2e7076c23277f5d9c058f8c722bdf177fbcb4b56f22b65675b209111a63356dd" + }, + { + "item_id": "function-negative-control-256k", + "hash": "sha256:3d54549df5d2bc058f05d74fdd9634024300b609e14429985e45281ff60de882" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-function-retrieval-256k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-function-retrieval-256k.jsonl", + "dataset_family": "function_retrieval", + "context_window_tokens": 256000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-32k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-32k-v1.json new file mode 100644 index 0000000..7acd059 --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-32k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-function-retrieval-32k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-function-retrieval-32k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:98cf5203642c463fabb29612a54cf706d9f576cd2c5f094d73c9f3b1ee1522b7", + "item_count": 5, + "item_hashes": [ + { + "item_id": "function-front-32k", + "hash": "sha256:68ca37b68f58d12e23141a9c28a1a9a325aac1d801bc2d8447302ab37f0c987e" + }, + { + "item_id": "function-middle-32k", + "hash": "sha256:8056c08c6e62bc679a875414d74ed37c896fca27ee0db269460eb20e6b0a484e" + }, + { + "item_id": "function-late-32k", + "hash": "sha256:1fe797370a033041244642da97f0e753c0a70c351496790684a8f5b2dc775918" + }, + { + "item_id": "function-two-blocks-32k", + "hash": "sha256:2d7622f5f996710ce57f45d1ee421ba2976bac35980d56e7fdc61817bf1bdc07" + }, + { + "item_id": "function-negative-control-32k", + "hash": "sha256:ffa07c1f7adc7825f317eaf4f53df3a63da8524ca1e4d4f91104bd3c9096d57b" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-function-retrieval-32k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-function-retrieval-32k.jsonl", + "dataset_family": "function_retrieval", + "context_window_tokens": 32000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-4k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-4k-v1.json new file mode 100644 index 0000000..4e093c9 --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-4k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-function-retrieval-4k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-function-retrieval-4k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:8f8a71a734d9cbb8a69072f467c3c673cad01d536fa2859f36881ccdb51e4f1b", + "item_count": 5, + "item_hashes": [ + { + "item_id": "function-front-4k", + "hash": "sha256:6e451c4c3f7a8ae153c5d87a12e8e89b8926796837061ba88b6d92033a4b2153" + }, + { + "item_id": "function-middle-4k", + "hash": "sha256:11c61d18fb7c07348d863b346d58f95cd6f38abfae9120f32f81070eb1a13828" + }, + { + "item_id": "function-late-4k", + "hash": "sha256:7b9750f61e53e247af61a613962198bf21d582be01cdf00c35c99a7843d72eea" + }, + { + "item_id": "function-two-blocks-4k", + "hash": "sha256:ec32f76349e8e849f8b1b724e84d6508b9eba8a00f7f8fde693760508bd842bb" + }, + { + "item_id": "function-negative-control-4k", + "hash": "sha256:24356452d9944f42a44afe705133ec909caae784e3a10125bb084898a5d02c6b" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-function-retrieval-4k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-function-retrieval-4k.jsonl", + "dataset_family": "function_retrieval", + "context_window_tokens": 4000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-64k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-64k-v1.json new file mode 100644 index 0000000..bc0b9a5 --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-64k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-function-retrieval-64k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-function-retrieval-64k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:4226782d48c4cf0a4e881353cd2b351700388a55eebe541a33f615e72ddd14c7", + "item_count": 5, + "item_hashes": [ + { + "item_id": "function-front-64k", + "hash": "sha256:c016db5396e11fa24f5f26ed5578d368f0cf1cdbcf08c7d12bd87a010c1eed60" + }, + { + "item_id": "function-middle-64k", + "hash": "sha256:aaabb6be8b5d035d539b962d13816708c173537c58b2a91eab298b22f323d773" + }, + { + "item_id": "function-late-64k", + "hash": "sha256:079b1e26a928739a3c2efd18b8f390435b557d8ff27de973f7f28e773e10825e" + }, + { + "item_id": "function-two-blocks-64k", + "hash": "sha256:4c686a8c640dc80fb3b5b43e2790aad97667be6e52c8d97cdba1dd6082dc0021" + }, + { + "item_id": "function-negative-control-64k", + "hash": "sha256:dc9b91faf80539d11ff62cd8867794a2a14b84770501225e0b1655ea2bff90ff" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-function-retrieval-64k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-function-retrieval-64k.jsonl", + "dataset_family": "function_retrieval", + "context_window_tokens": 64000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-8k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-8k-v1.json new file mode 100644 index 0000000..de1e995 --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-function-retrieval-8k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-function-retrieval-8k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-function-retrieval-8k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:7630f14fcfae4de49e9bbd3fca5796c168d5bab42b602b83ee1c1646f16450ec", + "item_count": 5, + "item_hashes": [ + { + "item_id": "function-front-8k", + "hash": "sha256:96854137cc3fcbb21fe6de14c7de0aa189360a8f0f099eb29febc9d2a041cc8d" + }, + { + "item_id": "function-middle-8k", + "hash": "sha256:ad326f29edc6e1e941ea2729ffd69c684a9ddf99984b2e6c1e4810cd2bf7fbfc" + }, + { + "item_id": "function-late-8k", + "hash": "sha256:782c77eeef23dddfb468b97d1ea292c31121923511ff8318be30b6d6b85ab58a" + }, + { + "item_id": "function-two-blocks-8k", + "hash": "sha256:e48475a792a884755f4aa69f74adffecd1107501985e06943adc44f4e372ef86" + }, + { + "item_id": "function-negative-control-8k", + "hash": "sha256:03ed19dd6d4f7151ca223a46a1880fd1f770d69abbb90f45c50a5705c5a10fe8" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-function-retrieval-8k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-function-retrieval-8k.jsonl", + "dataset_family": "function_retrieval", + "context_window_tokens": 8000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-128k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-128k-v1.json new file mode 100644 index 0000000..239f802 --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-128k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-needle-128k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-needle-128k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:b2774c05f2acd8fa756bb1621203c5efb6a332d401aabae8fdbaca5f1ccb3810", + "item_count": 5, + "item_hashes": [ + { + "item_id": "needle-front-128k", + "hash": "sha256:7863142fd73b9ac5a0682707ca53638d6e2930ccf52dc394eb2cec6f62cab6b9" + }, + { + "item_id": "needle-middle-128k", + "hash": "sha256:0daf0357c05dedd2776b3bf86c2135ea02a4a1446e5fcf2f947a65bfaec39a9b" + }, + { + "item_id": "needle-late-128k", + "hash": "sha256:caadacb31a91c13a07855896ad60a59b456819a7b1dfa482857587010e6fe721" + }, + { + "item_id": "needle-two-facts-128k", + "hash": "sha256:591a9d13740a75639d15a289631d9f7c5d905d4ead8ba14bc9a2dd621a905832" + }, + { + "item_id": "negative-control-128k", + "hash": "sha256:6a37d63fd9c9eba9c405e9afce5a9b750d5089c7aa06d8734b13f506d2123f22" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-needle-128k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-needle-128k.jsonl", + "dataset_family": "needle_position_retrieval", + "context_window_tokens": 128000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-16k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-16k-v1.json new file mode 100644 index 0000000..3437762 --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-16k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-needle-16k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-needle-16k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:86744ee0634635c924626112dae1d99d9d092d1c8f78ba81ac2d200cc2e3442c", + "item_count": 5, + "item_hashes": [ + { + "item_id": "needle-front-16k", + "hash": "sha256:cf0322db68e103a1a8e3f8d6e9f588d198e0350ac0b2457b5e95e6f725704d8a" + }, + { + "item_id": "needle-middle-16k", + "hash": "sha256:5ff5581f42f707ddfed29d703df86fbb07ea9e3fba0c0a9cd8bd71c1cd54b038" + }, + { + "item_id": "needle-late-16k", + "hash": "sha256:370f739153a2805d1b79879d1057a32404654da6e6330646b72720f7e1801900" + }, + { + "item_id": "needle-two-facts-16k", + "hash": "sha256:0b9660d1cd3df6907c3ffc79226205579df3d261cb3c16f3fa0bec6c4c507d66" + }, + { + "item_id": "negative-control-16k", + "hash": "sha256:cbdf6963372b9af8913f37a16357f3c998f6b8afe1c674d04533c43f52783cd3" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-needle-16k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-needle-16k.jsonl", + "dataset_family": "needle_position_retrieval", + "context_window_tokens": 16000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-256k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-256k-v1.json new file mode 100644 index 0000000..985250c --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-256k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-needle-256k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-needle-256k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:216aea5beef6f0b8018c7e5c05e93ff26558ed89423220564fb56a589d76dfb1", + "item_count": 5, + "item_hashes": [ + { + "item_id": "needle-front-256k", + "hash": "sha256:e4ece7695c74a52d751bd35830cecd5354cbb2078056d4950eaef5f2eab9fd68" + }, + { + "item_id": "needle-middle-256k", + "hash": "sha256:687ebdc00e10e8d37286890e97251b3500d11fe10f5274f0c4b5a4217702decd" + }, + { + "item_id": "needle-late-256k", + "hash": "sha256:020c630576f93fdb8f3186abc5750a168d6d3ab7505bb5ec575cc40744780f0a" + }, + { + "item_id": "needle-two-facts-256k", + "hash": "sha256:87da526b74ede14b6da6fa35cb41d59e1619e13feefd4fb09435d3160a147aa6" + }, + { + "item_id": "negative-control-256k", + "hash": "sha256:b1c03340b6a0970ce7e850b11daec7ac63b6bd8f98f9a24d9a5de3f010b4400e" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-needle-256k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-needle-256k.jsonl", + "dataset_family": "needle_position_retrieval", + "context_window_tokens": 256000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-32k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-32k-v1.json new file mode 100644 index 0000000..6791b2b --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-32k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-needle-32k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-needle-32k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:210176aba4f8239589a58f9a76cc5e14353a8bfece6b1b8ba0443b3ab52ad50b", + "item_count": 5, + "item_hashes": [ + { + "item_id": "needle-front-32k", + "hash": "sha256:1a7dadd405868def0e7c698e7cc2b73460a92bac4ecdf06f9e856d719744b997" + }, + { + "item_id": "needle-middle-32k", + "hash": "sha256:b8c11b618b18b8f89ad514001bcbde42eed6cc7f33057abc45f01deb31e7cd09" + }, + { + "item_id": "needle-late-32k", + "hash": "sha256:e9e8d657103ef3375b8bea8950bbd768ca7a2972550032c22a5b1176932025ec" + }, + { + "item_id": "needle-two-facts-32k", + "hash": "sha256:bca48ce704b4295daaed4842ad95baddfbed5d02bf4dd8d8bee18f2902e45faa" + }, + { + "item_id": "negative-control-32k", + "hash": "sha256:d2f21c6348bb10ea290d224b2d3c27fd4867b96f2532e9f62157a92a5cdd0498" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-needle-32k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-needle-32k.jsonl", + "dataset_family": "needle_position_retrieval", + "context_window_tokens": 32000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-4k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-4k-v1.json new file mode 100644 index 0000000..d444feb --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-4k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-needle-4k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/positional-recall-python.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:fa7d25ac4cf43652cb3509370581d004e0705a17df0f27b23fc9763632000985", + "item_count": 5, + "item_hashes": [ + { + "item_id": "needle-front-4k", + "hash": "sha256:0270015f129d2381aaee74eb45a238f3f39f30db7943a8977321166ee58ba8cc" + }, + { + "item_id": "needle-middle-4k", + "hash": "sha256:0529e5710120af823debdbaa4bf1170f607165570d30b5cb4e526dacd2ea51e8" + }, + { + "item_id": "needle-late-4k", + "hash": "sha256:7e351b4671b7c07b734b3726be2a6dcb1b43425963f52bfc10c5140eb09a1fc8" + }, + { + "item_id": "needle-two-facts-4k", + "hash": "sha256:5aa4a2771e9808e744ae79306014931c35f89134668347e5227eee8d06684a23" + }, + { + "item_id": "negative-control-4k", + "hash": "sha256:1e452cd6a5cf2956c71f0467f4e3b0695f7e4936fa1aed86b939d14891472dd6" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-needle-4k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/positional-recall-python.jsonl", + "dataset_family": "needle_position_retrieval", + "context_window_tokens": 4000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-64k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-64k-v1.json new file mode 100644 index 0000000..3577761 --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-64k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-needle-64k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-needle-64k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:d89252b7963d5c7767741507e36093d8ada92d2c9a9918bde8821485c6ad83f8", + "item_count": 5, + "item_hashes": [ + { + "item_id": "needle-front-64k", + "hash": "sha256:15af63e00ad4f2d8363d6fd5cda71ff608906400ca6600d463e64bc916857323" + }, + { + "item_id": "needle-middle-64k", + "hash": "sha256:8c04d165c0b80976ed3b96ac20dfb6ed551f01e4181ac1b4a3a988afb98eaaee" + }, + { + "item_id": "needle-late-64k", + "hash": "sha256:8484f25be1660080932be799de82fc1f37f1488b2957ea6abd83d30818f94183" + }, + { + "item_id": "needle-two-facts-64k", + "hash": "sha256:cfbba19d313e67532ba2961cf93855023bf7b62595f94011d77b414d78bc6286" + }, + { + "item_id": "negative-control-64k", + "hash": "sha256:15a55abe1c068186ed298c280fafc68b0dc44f29ba5c0897b2eb4a27af2868b4" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-needle-64k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-needle-64k.jsonl", + "dataset_family": "needle_position_retrieval", + "context_window_tokens": 64000 + } +} diff --git a/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-8k-v1.json b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-8k-v1.json new file mode 100644 index 0000000..075d997 --- /dev/null +++ b/backend/src/benchmark-library/documents/dataset_manifest/dataset-model-context-needle-8k-v1.json @@ -0,0 +1,46 @@ +{ + "kind": "dataset_manifest", + "schema_version": "benchmark_dataset_manifest_v1", + "dataset_id": "dataset-model-context-needle-8k-v1", + "source": { + "source_type": "file", + "format": "jsonl", + "path": "data/datasets/context-needle-8k.jsonl" + }, + "canonicalization_version": "dataset_canonical_v1", + "snapshot_policy": "manifest_only", + "dataset_hash": "sha256:cdef88547ca7bb63ab23af65a3fbea38732cfd94efaa4d6b7974ca504cf60560", + "item_count": 5, + "item_hashes": [ + { + "item_id": "needle-front-8k", + "hash": "sha256:1dd9898808b4fd7e4fa037c7abf853bc49fced14c59daf46f50c6589ad0a4ab9" + }, + { + "item_id": "needle-middle-8k", + "hash": "sha256:68d98ac1d9caeb61424871fa12d5638e810653baa04e85cf3dd0d5a8530db741" + }, + { + "item_id": "needle-late-8k", + "hash": "sha256:c7cc7c6330e6a0bddaaec222190d7c2dbf047485393f61afa784b780999954f8" + }, + { + "item_id": "needle-two-facts-8k", + "hash": "sha256:c7eb20006d644b8a654fe5ce6181b08caaa111ac6f314165da521347292acbb7" + }, + { + "item_id": "negative-control-8k", + "hash": "sha256:93955355155cb98952fd58a9eb2236e9f6a329984cb82b6683afa17dca59d5e6" + } + ], + "item_manifest_ref": null, + "snapshot_blob_ref": null, + "metadata": { + "source": "built-in-context-library", + "template_id": "model-context-needle-8k-v1", + "source_file": "backend/data/datasets/frame.py", + "dataset_file": "backend/data/datasets/context-needle-8k.jsonl", + "dataset_family": "needle_position_retrieval", + "context_window_tokens": 8000 + } +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-128k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-128k-v1.json new file mode 100644 index 0000000..0148891 --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-128k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-function-retrieval-128k-v1", + "template_version": "1.0.0", + "name": "Model - Context function retrieval 128k", + "description": "Checks whether a model can retrieve full Python function blocks from front, middle, late, two-function, and negative-control prompts at about 128k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_function_retrieval_128k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "benchmark_subfamily": "function_retrieval", + "target_behavior": "positional_context_function_retrieval", + "context_window_tokens": 128000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-function-retrieval-128k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-16k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-16k-v1.json new file mode 100644 index 0000000..d301f5b --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-16k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-function-retrieval-16k-v1", + "template_version": "1.0.0", + "name": "Model - Context function retrieval 16k", + "description": "Checks whether a model can retrieve full Python function blocks from front, middle, late, two-function, and negative-control prompts at about 16k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_function_retrieval_16k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "benchmark_subfamily": "function_retrieval", + "target_behavior": "positional_context_function_retrieval", + "context_window_tokens": 16000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-function-retrieval-16k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-256k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-256k-v1.json new file mode 100644 index 0000000..d98abcd --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-256k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-function-retrieval-256k-v1", + "template_version": "1.0.0", + "name": "Model - Context function retrieval 256k", + "description": "Checks whether a model can retrieve full Python function blocks from front, middle, late, two-function, and negative-control prompts at about 256k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_function_retrieval_256k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "benchmark_subfamily": "function_retrieval", + "target_behavior": "positional_context_function_retrieval", + "context_window_tokens": 256000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-function-retrieval-256k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-32k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-32k-v1.json new file mode 100644 index 0000000..f055e91 --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-32k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-function-retrieval-32k-v1", + "template_version": "1.0.0", + "name": "Model - Context function retrieval 32k", + "description": "Checks whether a model can retrieve full Python function blocks from front, middle, late, two-function, and negative-control prompts at about 32k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_function_retrieval_32k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "benchmark_subfamily": "function_retrieval", + "target_behavior": "positional_context_function_retrieval", + "context_window_tokens": 32000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-function-retrieval-32k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-4k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-4k-v1.json new file mode 100644 index 0000000..ee995cb --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-4k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-function-retrieval-4k-v1", + "template_version": "1.0.0", + "name": "Model - Context function retrieval 4k", + "description": "Checks whether a model can retrieve full Python function blocks from front, middle, late, two-function, and negative-control prompts at about 4k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_function_retrieval_4k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "benchmark_subfamily": "function_retrieval", + "target_behavior": "positional_context_function_retrieval", + "context_window_tokens": 4000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-function-retrieval-4k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-64k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-64k-v1.json new file mode 100644 index 0000000..2494d8a --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-64k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-function-retrieval-64k-v1", + "template_version": "1.0.0", + "name": "Model - Context function retrieval 64k", + "description": "Checks whether a model can retrieve full Python function blocks from front, middle, late, two-function, and negative-control prompts at about 64k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_function_retrieval_64k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "benchmark_subfamily": "function_retrieval", + "target_behavior": "positional_context_function_retrieval", + "context_window_tokens": 64000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-function-retrieval-64k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-8k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-8k-v1.json new file mode 100644 index 0000000..9264abe --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-function-retrieval-8k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-function-retrieval-8k-v1", + "template_version": "1.0.0", + "name": "Model - Context function retrieval 8k", + "description": "Checks whether a model can retrieve full Python function blocks from front, middle, late, two-function, and negative-control prompts at about 8k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_function_retrieval_8k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "benchmark_subfamily": "function_retrieval", + "target_behavior": "positional_context_function_retrieval", + "context_window_tokens": 8000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-function-retrieval-8k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-needle-128k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-needle-128k-v1.json new file mode 100644 index 0000000..0a5caea --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-needle-128k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-needle-128k-v1", + "template_version": "1.0.0", + "name": "Model - Context needle 128k", + "description": "Checks whether a model can retrieve exact needle values from front, middle, late, two-fact, and negative-control prompts at about 128k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_needle_128k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "exact_match", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "target_behavior": "positional_context_needle_retrieval", + "context_window_tokens": 128000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-needle-128k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-needle-16k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-needle-16k-v1.json new file mode 100644 index 0000000..25fdd12 --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-needle-16k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-needle-16k-v1", + "template_version": "1.0.0", + "name": "Model - Context needle 16k", + "description": "Checks whether a model can retrieve exact needle values from front, middle, late, two-fact, and negative-control prompts at about 16k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_needle_16k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "exact_match", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "target_behavior": "positional_context_needle_retrieval", + "context_window_tokens": 16000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-needle-16k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-needle-256k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-needle-256k-v1.json new file mode 100644 index 0000000..eaf5e71 --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-needle-256k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-needle-256k-v1", + "template_version": "1.0.0", + "name": "Model - Context needle 256k", + "description": "Checks whether a model can retrieve exact needle values from front, middle, late, two-fact, and negative-control prompts at about 256k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_needle_256k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "exact_match", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "target_behavior": "positional_context_needle_retrieval", + "context_window_tokens": 256000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-needle-256k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-needle-32k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-needle-32k-v1.json new file mode 100644 index 0000000..97156b7 --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-needle-32k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-needle-32k-v1", + "template_version": "1.0.0", + "name": "Model - Context needle 32k", + "description": "Checks whether a model can retrieve exact needle values from front, middle, late, two-fact, and negative-control prompts at about 32k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_needle_32k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "exact_match", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "target_behavior": "positional_context_needle_retrieval", + "context_window_tokens": 32000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-needle-32k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-needle-4k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-needle-4k-v1.json new file mode 100644 index 0000000..5396519 --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-needle-4k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-needle-4k-v1", + "template_version": "1.0.0", + "name": "Model - Context needle 4k", + "description": "Checks whether a model can retrieve exact needle values from front, middle, late, two-fact, and negative-control prompts at about 4k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_needle_4k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "exact_match", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "target_behavior": "positional_context_needle_retrieval", + "context_window_tokens": 4000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-needle-4k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-needle-64k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-needle-64k-v1.json new file mode 100644 index 0000000..7f65587 --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-needle-64k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-needle-64k-v1", + "template_version": "1.0.0", + "name": "Model - Context needle 64k", + "description": "Checks whether a model can retrieve exact needle values from front, middle, late, two-fact, and negative-control prompts at about 64k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_needle_64k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "exact_match", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "target_behavior": "positional_context_needle_retrieval", + "context_window_tokens": 64000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-needle-64k-v1" + }, + "extensions": {} +} diff --git a/backend/src/benchmark-library/documents/test_template/model-context-needle-8k-v1.json b/backend/src/benchmark-library/documents/test_template/model-context-needle-8k-v1.json new file mode 100644 index 0000000..cef8eca --- /dev/null +++ b/backend/src/benchmark-library/documents/test_template/model-context-needle-8k-v1.json @@ -0,0 +1,64 @@ +{ + "kind": "test_template", + "schema_version": "benchmark_test_template_v1", + "template_id": "model-context-needle-8k-v1", + "template_version": "1.0.0", + "name": "Model - Context needle 8k", + "description": "Checks whether a model can retrieve exact needle values from front, middle, late, two-fact, and negative-control prompts at about 8k tokens.", + "operation": "chat_completion", + "required_capabilities": { + "chat_completion": true, + "streaming": false, + "tool_calling": false, + "structured_output": false + }, + "input_contract": { + "required_fields": [ + "prompt", + "expected_answer", + "expected_format" + ], + "optional_fields": [ + "system_prompt", + "tags", + "metadata" + ], + "min_items": 5 + }, + "stages": [ + { + "id": "context_needle_8k", + "type": "dataset_loop", + "iterations_per_item": 1, + "record_metrics": true, + "order": "sequential", + "cooldown_ms": 0, + "stop_on_error": true + } + ], + "metrics": [ + "input_tokens", + "output_tokens", + "total_tokens", + "elapsed_ms", + "first_token_ms", + "tokens_per_second", + "exact_match", + "contains_required_terms" + ], + "aggregations": [ + "mean", + "p50", + "p95", + "count" + ], + "metadata": { + "source": "built-in-context-library", + "benchmark_family": "context_window", + "target_behavior": "positional_context_needle_retrieval", + "context_window_tokens": 8000, + "recommended_temperature": 0, + "dataset_id": "dataset-model-context-needle-8k-v1" + }, + "extensions": {} +} diff --git a/backend/src/services/benchmark-runner.ts b/backend/src/services/benchmark-runner.ts index 86f9fa1..85501ab 100644 --- a/backend/src/services/benchmark-runner.ts +++ b/backend/src/services/benchmark-runner.ts @@ -484,6 +484,44 @@ function isFatalError(code: string): boolean { || ['http_400', 'http_401', 'http_403', 'http_404'].includes(code); } +function stringAtPath(value: unknown, path: string[]): string | null { + let current = value; + for (const part of path) { + const record = objectValue(current); + if (!record) return null; + current = record[part]; + } + return textFromValue(current); +} + +function classifyUpstreamError(input: { status: number; body: unknown; text: string | null }): Record { + const upstreamCode = + stringAtPath(input.body, ['error', 'code']) ?? + stringAtPath(input.body, ['code']) ?? + null; + const upstreamType = + stringAtPath(input.body, ['error', 'type']) ?? + stringAtPath(input.body, ['type']) ?? + null; + const upstreamMessage = + stringAtPath(input.body, ['error', 'message']) ?? + stringAtPath(input.body, ['message']) ?? + textFromValue(input.text) ?? + null; + const signal = `${upstreamCode ?? ''} ${upstreamType ?? ''} ${upstreamMessage ?? ''}`.toLowerCase(); + const category = signal.includes('prefill_memory_exceeded') || signal.includes('memory guard') || signal.includes('context length') + ? 'context_prefill_memory_exceeded' + : input.status === 400 + ? 'invalid_request' + : 'upstream_http_error'; + return { + upstream_code: upstreamCode, + upstream_type: upstreamType, + upstream_message: upstreamMessage, + error_category: category + }; +} + export function buildBenchmarkRequestPayload( instantiation: Record, item: Record @@ -1109,7 +1147,8 @@ async function executeItem( ...pairMeta, item_index: executable.itemIndex, iteration: executable.iteration, - attempt + attempt, + ...classifyUpstreamError({ status: response.status, body: responseBody, text: responseBody === null ? responseText : null }) }; attemptErrors.push(issue); if (!issue.retryable || attempt >= maxAttempts) { diff --git a/backend/tests/integration/benchmark-runner.test.ts b/backend/tests/integration/benchmark-runner.test.ts index d00bc5b..093d55d 100644 --- a/backend/tests/integration/benchmark-runner.test.ts +++ b/backend/tests/integration/benchmark-runner.test.ts @@ -267,6 +267,30 @@ function installMockInferenceFetch(status: number | number[] = 200): { baseUrl: return { baseUrl: 'http://mock.local', requests, headers }; } +function installMockPrefillMemoryErrorFetch(): { baseUrl: string; requests: unknown[] } { + const requests: unknown[] = []; + vi.stubGlobal('fetch', vi.fn(async (input: RequestInfo | URL, init?: RequestInit) => { + const url = String(input); + if (url !== 'http://mock.local/v1/chat/completions') { + return new Response('', { status: 404 }); + } + const body = typeof init?.body === 'string' ? JSON.parse(init.body) as Record : {}; + requests.push(body); + return new Response(JSON.stringify({ + error: { + message: 'oMLX prefill memory guard rejected this prompt.', + type: 'invalid_request_error', + code: 'prefill_memory_exceeded' + }, + type: 'error' + }), { + status: 400, + headers: { 'Content-Type': 'application/json' } + }); + })); + return { baseUrl: 'http://mock.local', requests }; +} + function installMockTokenSequenceFetch(tokens: number[]): { baseUrl: string; requests: unknown[] } { const requests: unknown[] = []; const queue = [...tokens]; @@ -1043,6 +1067,56 @@ describe('benchmark runner API', () => { await app.close(); }); + it('cancels on first fatal upstream prefill memory error and preserves provider diagnostics', async () => { + mockServer = installMockPrefillMemoryErrorFetch(); + const app = createServer(); + seedServerAndModel(mockServer.baseUrl); + + const createResponse = await app.inject({ + method: 'POST', + url: '/benchmark/instantiations', + headers: AUTH_HEADERS, + payload: { + template: benchmarkTemplate(), + server_id: 'srv-runner', + model_id: 'mock-chat', + runtime_profile: runtimeProfile({ + timeout_ms: 5000, + cancellation_policy: { cancel_on_first_fatal_error: true } + }), + dataset: { + dataset_id: 'embedded-runner', + source: { source_type: 'inline', format: 'json' }, + snapshot_policy: 'embedded', + items: [ + { id: 'item-1', prompt: 'Run first.' }, + { id: 'item-2', prompt: 'Run second.' } + ] + } + } + }); + expect(createResponse.statusCode, JSON.stringify(createResponse.json())).toBe(201); + + const runResponse = await app.inject({ + method: 'POST', + url: `/benchmark/instantiations/${createResponse.json().id}/run`, + headers: AUTH_HEADERS + }); + expect(runResponse.statusCode).toBe(201); + const result = runResponse.json(); + expect(result.document.status).toBe('cancelled'); + expect(result.document.metadata.cancellation_reason).toBe('cancel_on_first_fatal_error'); + expect(result.document.errors[0]).toMatchObject({ + code: 'http_400', + upstream_code: 'prefill_memory_exceeded', + upstream_type: 'invalid_request_error', + error_category: 'context_prefill_memory_exceeded' + }); + expect(result.document.stage_results[0].results[1].status).toBe('skipped'); + expect(mockServer.requests).toHaveLength(1); + await app.close(); + }); + it('executes a manifest_only JSONL dataset after hash verification', async () => { mockServer = installMockInferenceFetch(); const app = createServer(); diff --git a/backend/tests/unit/benchmark-library.test.ts b/backend/tests/unit/benchmark-library.test.ts index c0fc419..d640668 100644 --- a/backend/tests/unit/benchmark-library.test.ts +++ b/backend/tests/unit/benchmark-library.test.ts @@ -13,9 +13,12 @@ import { installBenchmarkLibraryDocuments, putBenchmarkDocumentWithLibrary } from '../../src/services/benchmark-library.js'; +import { resolveBenchmarkDatasetItems } from '../../src/services/benchmark-datasets.js'; const moduleDir = path.dirname(fileURLToPath(import.meta.url)); const schemaPath = path.resolve(moduleDir, '../../src/models/schema.sql'); +const contextNeedleSizes = ['4k', '8k', '16k', '32k', '64k', '128k', '256k'] as const; +const contextFunctionRetrievalSizes = contextNeedleSizes; function templateDoc(templateId = 'user-template'): Record { return { @@ -71,6 +74,85 @@ describe('benchmark library persistence', () => { .toBe('Agent - Codex apply_patch'); expect(getBenchmarkDocumentOrNull('dataset_manifest', 'dataset-agent-codex-apply-patch-v1')?.document.item_count) .toBe(2); + expect(getBenchmarkDocumentOrNull('test_template', 'model-context-python-snippet-retrieval-v1')).toBeNull(); + expect(getBenchmarkDocumentOrNull('dataset_manifest', 'dataset-model-context-python-snippet-retrieval-v1')).toBeNull(); + for (const size of contextNeedleSizes) { + const templateId = `model-context-needle-${size}-v1`; + const datasetId = `dataset-model-context-needle-${size}-v1`; + expect(getBenchmarkDocumentOrNull('test_template', templateId)?.document.name) + .toBe(`Model - Context needle ${size}`); + expect(getBenchmarkDocumentOrNull('dataset_manifest', datasetId)?.document).toMatchObject({ + item_count: 5, + metadata: { template_id: templateId } + }); + } + for (const size of contextFunctionRetrievalSizes) { + const templateId = `model-context-function-retrieval-${size}-v1`; + const datasetId = `dataset-model-context-function-retrieval-${size}-v1`; + expect(getBenchmarkDocumentOrNull('test_template', templateId)?.document.name) + .toBe(`Model - Context function retrieval ${size}`); + expect(getBenchmarkDocumentOrNull('dataset_manifest', datasetId)?.document).toMatchObject({ + item_count: 5, + metadata: { template_id: templateId } + }); + } + }); + + it('loads every built-in Python context needle dataset from its file-backed manifest', () => { + installBenchmarkLibraryDocuments(); + for (const size of contextNeedleSizes) { + const dataset = getBenchmarkDocumentOrNull('dataset_manifest', `dataset-model-context-needle-${size}-v1`)?.document; + expect(dataset).toBeTruthy(); + const items = resolveBenchmarkDatasetItems({ dataset }); + expect(items).toHaveLength(5); + expect(items.map((item) => item.id)).toEqual([ + `needle-front-${size}`, + `needle-middle-${size}`, + `needle-late-${size}`, + `needle-two-facts-${size}`, + `negative-control-${size}` + ]); + expect(items[0]).toMatchObject({ + expected_format: 'free_text', + metadata: { needle_position: 'front', needle_count: 1 } + }); + expect(items[3]).toMatchObject({ + metadata: { needle_count: 2 } + }); + expect(items[4]).toMatchObject({ + expected_answer: 'NOT_FOUND', + metadata: { needle_count: 0, needle_position: 'absent' } + }); + } + }); + + it('loads every built-in Python context function retrieval dataset from its file-backed manifest', () => { + installBenchmarkLibraryDocuments(); + for (const size of contextFunctionRetrievalSizes) { + const dataset = getBenchmarkDocumentOrNull('dataset_manifest', `dataset-model-context-function-retrieval-${size}-v1`)?.document; + expect(dataset).toBeTruthy(); + const items = resolveBenchmarkDatasetItems({ dataset }); + expect(items).toHaveLength(5); + expect(items.map((item) => item.id)).toEqual([ + `function-front-${size}`, + `function-middle-${size}`, + `function-late-${size}`, + `function-two-blocks-${size}`, + `function-negative-control-${size}` + ]); + expect(items[0]).toMatchObject({ + expected_format: 'code', + metadata: { function_name: '_constructor_from_mgr', function_position: 'front' } + }); + expect(items[3]).toMatchObject({ + expected_format: 'code', + metadata: { function_names: ['_construct_result', '_to_dict_of_blocks'] } + }); + expect(items[4]).toMatchObject({ + expected_answer: 'NOT_FOUND', + metadata: { function_position: 'absent' } + }); + } }); it('rebuilds user-created documents from the user library after the database is erased', () => { diff --git a/frontend/src/pages/RunUnified.tsx b/frontend/src/pages/RunUnified.tsx index 8806a70..8f4fc09 100644 --- a/frontend/src/pages/RunUnified.tsx +++ b/frontend/src/pages/RunUnified.tsx @@ -32,9 +32,11 @@ import { listModels, ModelRecord } from '../services/models-api.js'; import { correctnessMetricTiles, type CorrectnessMetricTile } from '../services/benchmark-metric-metadata.js'; import { assignRunAccents, + evaluateTemplateCompatibility, findLinkedDatasetManifest, mergeRunModelOptions, parseRunTargets, + selectCompatibleTemplateId, serializeRunTargets, summarizeBenchmarkMetricFailures, targetKey, @@ -272,9 +274,16 @@ function ConfigRail({ || (datasetMode === 'inline' ? prompt.trim().length > 0 : datasetId.trim().length > 0 && datasetPath.trim().length > 0); + const templateCompatibility = new Map(templates.map((template) => [ + template.id, + evaluateTemplateCompatibility(template, selectedTargets, options) + ])); + const selectedTemplateCompatibility = selectedTemplateId + ? templateCompatibility.get(selectedTemplateId) + : null; const canRun = selectedTargets.length >= 1 && selectedTemplateId.length > 0 && !busy && ( datasetInputReady - ); + ) && selectedTemplateCompatibility?.compatible !== false; return (